Introduction

What is Mercurial?

Mercurial, or hg, is a distributed source code management (SCM) system. Source code management systems, also called "version control systems", are tools that allow groups of programmers to work together on the same projects without getting in each others' way any more than is necessary. They do this in three primary ways: first, they identify a particular collection of files as "the project"; second, they keep track of changes so multiple copies of the project (belonging to multiple programmers) can be kept synchronized; and third, they keep track of history so previous versions of the project can be examined. A good SCM system helps keep a project organized and provides ways for the programmers involved with it to coordinate with one another and do their work in parallel.

A distributed SCM system is one that allows collaboration among multiple programmers without having to set up a centralized server for the project. Older centralized SCM systems required this; it is often inconvenient or administratively expensive. The major conceptual difference is that while a centralized SCM system has a designated master copy of the repository, distributed SCM systems do not. Different copies of the repositories are peers and no one system has a special status, although a group of programmers can decide by convention to treat some copy specially.

If you have used CVS or Subversion, you will find Mercurial somewhat different. If you have used git, Monotone, or BitKeeper you will find Mercurial to be conceptually similar. (And if you have never used any of these, don't worry.)

Why source management?

In most computer science courses, each assignment is a distinct unit: you sit down and code something up, you hand it in, it gets graded, and you immediately forget about it or even throw it away. In this environment, a distributed source management system is not really necessary. (You might still find a SCM system useful to help you keep track of the changes you make in your assignment as you work on it.)

In the real world, however, most programs are large and expensive to develop, so their life-cycles are measured in years or sometimes decades. Over these time scales, and with such large amounts of code, just keeping track of everything becomes a major problem.

Worse, in the real world, programs have users, who are not part of the development team and not (generally) interested in internal details of the program. Usually, someone insists that now and then a new version be made available to the users. The development team has to be able to issue these releases, and then also has to be able, for years afterwards, to handle bug reports and sometimes issue fixes. In this environment it is imperative to be able to go to some official place and get a copy of the precise release you need.

And finally, when you have a number of programmers working on the same program at once, it's essential that some mechanism be put in place to allow them to coordinate their work. Otherwise, each programmer's version slowly diverges from the others, and eventually everyone has a private version different from everyone else's... and none of them work. Once this happens, it takes an immense amount of effort to straighten out the mess.

Source code management (or version control) systems are designed to help programming teams handle these issues.

Why Mercurial?

We use Mercurial because it is powerful, freely and widely available, easy to learn, and commonly used. Many large open-source and proprietary projects are managed using Mercurial.

Remainder of This Document

The rest of this handout is divided into two main sections. The first explains the philosophy of Mercurial, its operating model, and the assumptions behind the way it works. The rest explains, in terms of how one actually uses Mercurial rather than its various commands, a number of basic and not-so-basic Mercurial operations. A small additional section lists the main Mercurial commands.

You do not need to remember everything in this handout. In fact, this handout was written mostly so you don't have to remember it all. When you read it the first time, make sure you have a good mental model of how Mercurial works and how we are using it. Then, use this as a reference to find out how to do particular tasks.

The World According to Mercurial

Distributed Model

The Mercurial model assumes that there is no one central official master copy of anything. Instead, many copies (called "clones") can be made. Each clone has two distinct parts: the project history and the actual working copy of the files that comprise the project. It will be much simpler to work with Mercurial if you think of these two parts as distinct entities, because certain commands work on specific parts. The history is organized so that you can exchange changes among clones without actually changing the contents of the files that comprise your project. In many projects, one particular clone somewhere is considered "official" for administrative purposes; however, from a technical standpoint this clone is no different from any other. Clones are also often called "repositories".

When you wish to work on a Mercurial-managed project, you create your own private clone of the project.

Since work on a program is an ongoing process, and other people may be working on the same program at the same time, you generally want changes made by other developers to appear in your own private clone. Accomplishing this requires two things: exchanging histories to know what changed and then modifying your local clone's working files to reflect those changes.

The act of requesting and receiving history from another clone is called "pulling" and is done with "hg pull". When you clone a Mercurial repository, your new clone contains a link back to the parent repository. If you pull without specifying a repository from which to pull, this parent is the clone from which you will pull changes. You can also pull new changes from any other clone of the same project that you have access to; this allows, for example, collaborating with someone else on changes that aren't yet ready for prime time.

Push is the inverse of pull: it sends your changes to another repository. As with pull, the default location to which a push applies is the parent repository, but you can also specify any clone to which you have sufficient access. (The only difference between "pull" and "push" is which direction data flows in.)

Note, however, that changes do not propagate automatically. You must explicitly pull changes to get them into your private clone when they become available elsewhere. And you must explicitly push changes if you want them to appear somewhere else. You and your collaborators should discuss what your group policy will be for pushing and pulling. (Pushing changes that don't compile or will break your collaborators' repositories will probably not be conducive to group harmony.)

Working Copy of the Project

Recall that your clone has two pieces: the history and the actual files. Push and pull exchange histories, but do not update the working files in a local (or remote) clone.

To apply new changes to the working copy, you must update. Often, this is the first thing you do after pulling new changes; for this reason there is an option to the pull command to immediately also do an update. However, this behavior is not the default. (This point often confuses people used to CVS and Subversion, where "pull" and "update" are inherently rolled together into a single action called "update".)

When you have made modifications to the files in your working copy, before you can share them with anyone else you must first make them into official changes by committing them. This enters them into the project history. Then you can push them to other repositories. Changes that have not been committed do not officially exist, will not be pushed, and cannot be seen in anyone else's clones.

Accessing Repositories

Mercurial provides a variety of ways for accessing repositories/clones that might be anywhere on the Internet. A repository name can be the name of a (local) directory; however, it can also be a URL. Repositories can be accessed over HTTP (http://...), HTTP with SSL (https://...), and also via ssh: ssh://machinename/path.

You can have as many clones of your own as you want. Often you will have only one. However, it often makes sense to have a separate clone on every machine or cluster you work on This allows you to work locally in various contexts, and the clones can be synced up easily. There are also circumstances in which it's convenient to have several or even many clones on the same machine, and sometimes you'll make temporary ones.

Any clone can itself be cloned, and sometimes you'll use this ability on your own private clones. Just remember that each clone works independently, and that changes can be pulled or pushed from any clone to any other clone according to whatever structure you wish. (If you set up a complicated structure, though, it's usually a good idea to also make a README file documenting that structure.) Some examples of working setups with multiple local clones are given later on.

Changesets and Merging

Like most source control systems, Mercurial does not lock files for modification. This would not work at all in a distributed environment. Nor are files merged haphazardly as they typically are in centralized systems like CVS. Instead, each batch of changes someone commits, called a changeset, is treated as a new version of the project. Changesets are identified by applying a cryptographic hash to the changes; this gives a long hexadecimal code number that globally identifies the changeset. For convenience, Mercurial also provides a sequence number within the local repository for each changeset; these are easier to work with but are different in each clone. Don't try to use them when talking to your partner; you'll just both get confused. (The sequence depends on what changes were pushed or committed when and in what order.)

Each changeset is based on some specific earlier version. If several changes are committed "simultaneously", each based on the same previous version, then each one becomes a new "head", or lightweight branch. New versions can be committed after any or all of them, or after any other version anywhere in the project history. These new versions can diverge arbitrarily if desired; however, generally that is not desired, so multiple heads are combined by merging.

Merges of changesets that change disjoint sets of files are easy. Merges of changesets that make small unrelated changes in the same file will also go through automatically. However, overlapping edits result in merge conflicts which need to be resolved by hand editing. If two or more people have made different sets of sweeping changes to the same file at once, this editing can become a nightmare. For this reason, while in the distributed model anyone can commit anything anytime, it is always a good idea to coordinate with other programmers on the project before embarking on major or intrusive rewrites.

If a merge fails, files that require editing will have blocks in them that look like this:

@@ -1 +1,5 @@
  int foo(void) {
  <<<<<<< local
     bar();
  =======
     baz();
  >>>>>>> other
  }
This means that you changed foo to call bar, but the other version you merged with changed it to call baz. To fix this you must pick one or the other (or edit into some other form entirely) and remove the markings. Mercurial remembers which files need hand-merging, so you need to tell it that you fixed ("resolved") each broken file before it will let you commit the merge. Note however that there's no way it can cross-check what you did; it's up to you to do the right thing and not mark files resolved until they really are.

Because merging two changesets is itself a change, you must commit it for it to officially exist. If you forget to commit the merge, or you forget to do a merge at all, or someone has committed and pushed other changes upstream that you haven't pulled and merged yet, pushing your changes will fail. When Mercurial says "push creates new remote heads", this is what happened.

If you have modified files in your working tree and you update to integrate new changes, Mercurial will merge the new changes with your modifications. This can also result in merge conflicts in the modified files that need to be resolved by hand, and you still need to mark them resolved. However, once you've done that you can keep editing until you're ready to commit your changes; you don't need to commit the merge explicitly.

Some notes on merges:

  1. Even when there are conflicts, the conflict blocks do not necessarily reflect all the changes associated with the merge. Some may have merged successfully. Occasionally, they may have merged "successfully" but be wrong. If in doubt, look at diffs.
  2. Also, sometimes the conflict block delimiters don't contain everything that may be involved in resolving a conflict correctly.
  3. While the merge system is reasonably robust, once in a while it makes a mistake, particularly if some but not all of the changes merged. It's prudent to look at diffs after an automatic merge, just in case. Mercurial has a fairly sophisticated merge algorithm, but it's still only an algorithm and has no brain.
  4. Merging is painful. Merging a big change is a lot more painful than merging the same amount of change a bit at a time. Update early and often. Commit early and often.

If you are planning to make huge changes to a file, like reordering all the functions or moving large blocks of code into if clauses (which changes the indent, making Mercurial think everything changed), it's a good idea to coordinate manually with anyone else who might have pending changes to the file.

Log Messages

When you commit changes to a Mercurial repository, Mercurial gives you the opportunity to provide a message explaining the change. These messages get saved in the project history and can be reviewed later using hg log. This can be quite useful when trying to reconstruct the thinking that led to some piece of code you wrote months previously.

These messages can also be logged to some central point or mailed out to the people working on the project. It is possible to set up your Mercurial repository to mail commit messages to you and your partner. (See below.) While the volume of mail thus generated can be irritating, there's no better way to stay in touch with what's going on.

The commit message should thus describe (briefly) what you did and why. The first line should be a short summary; this is all hg log prints unless you give it the -v option. There's no need to report the exact changes, as they can be retrieved using hg diff.

When to Commit and Push

The general rule for commits is that any change should be committed as soon as you're reasonably certain that it's correct and appropriate in the long term. This is also the general rule for pushing: as soon as you're reasonably certain that a change is correct and appropriate in the long term, you should push it out so it's available to the people you're working with... subject to the proviso that committing and pushing many small changes in quick succession tends to annoy people.

Because in Mercurial (and other distributed SCM systems) commit is local and separated from push, you can commit as much as you want whenever you want without interfering with anyone else's work, and wait to push until you're ready to inflict your changes on your partner. This means that sometimes it makes sense to commit changes that you know aren't correct, just to checkpoint them, or so you can push your current code to your laptop so you can work remotely.

Remember that your partner won't see anything you don't push to him, and you can't push newer changesets before older ones. It often makes sense to make quick bug fixes in a separate clone from the big new feature you're working on; then they can be pushed independently. This way you have to merge them into the tree with your big new feature, but that's usually not difficult.

Ideally you and your partner should keep track of which tests you expect to work at any particular time, and before committing and pushing out new code check to make sure that they all still do work.

In most cases, one should try to avoid pushing out changes that cause the program to stop working properly (or, even, stop compiling at all.) This rule can sometimes be profitably bent when you know your partner will not be affected by the errors introduced.

Tags

While Mercurial uses global version numbers, unlike the per-file version numbers used by older tools like CVS, these versions are hash codes, not numbers, and are annoying to work with. Mercurial supports a concept known as a tag, which is a symbolic name (such as asst4-debugged) that you attach to a particular version of your project. You can then refer to that version of those files with the name.

See below for specific directions for manipulating tags with Mercurial.

Branches

Sometimes you might have more than one "line of development" in your program. For instance, when you ship release 1.0 to customers, you might have one team working on release 2.0, and another team making minor bug fixes to the release 1.0 code for release 1.01.

In this case, most changes made for release 2.0 should not be incorporated into release 1.01, and while many fixes made for release 1.01 should be incorporated into release 2.0, some probably shouldn't be.

This sort of situation is handled using "branches". Each branch is a (mostly) separate line of development, diverging from some common ancestor version. (This divergence is where the term "branch" arose.)

In Mercurial every new changeset is potentially its own branch, as discussed above. These lightweight branches are generally short-lived and joined together again rapidly by merging. Mercurial also supports named branches; this feature allows attaching a symbolic name to a branch that is meant to be a first-class citizen and/or continue for a substantial length of time. Named branches are primarily an organizational tool; using them is beyond the scope of this documentation.

Use Mercurial Effectively

Mercurial (or any version control system) is a tool, not a panacea. It helps you organize and maintain a project, but it doesn't do it by itself. It requires that you use it in a manner that makes it useful.

In order for the system to be useful for keeping track of what's really part of the project and what isn't, you have to actively maintain the set of files Mercurial knows about. Don't add or commit temporary files, editor backups, object files, and the like to the Mercurial project history. If you have files that are complicating your development process that you do not want to commit, add them to the .hgignore file at the top level of the tree. (This is discussed in more detail below.) Do remove files you're not using any more. (You can still get them back later, because they're part of the project history and removing them is just a change that Mercurial tracks.)

In order for the version history to be useful, you have to add tags at important points in development, like releases. You also have to write at least minimally useful commit messages so you can look at them later and be reminded of the circumstances.

In order for the merging features to be useful, you have to avoid making sweeping changes without warning your partner, you have to pull and update regularly, you have to commit and push regularly but not insanely often, and you have to take the trouble to merge correctly by hand when conflicts occur.

If you don't do these things, you will eventually end up in a hole, and Mercurial will not save you from yourself.

How do I...

The previous section explained Mercurial concepts in general terms. In this section we explain how to do various useful things.

How do I set up my username?

Mercurial wants you to declare your username before you commit anything. Edit the file .hgrc in your home directory and add text like the following:

   [ui]
   username = foo

How do I make a new repository?

Make an empty directory and run hg init in it. This creates an empty project history and an empty set of working files.

For most of your courses, however, the repository will be created by the instructor on the webserver, and you will be given instructions on how to clone it.

How do I make a new project in a repository?

You don't - in almost all cases if you have a separate project you want a separate repository for it.

How do I clone an existing repository?

With hg clone, like this:

   % hg clone http://USERNAME@140.141.132.4:8001/REPONAME

How do I add stuff to a repository?

Create some files and/or directories and run hg add on them:

   % hg add newfile
If you run hg add without any arguments it will add all the new files it can find. Don't forget to commit; adding only modifies the state of the working files.

How do I import a lot of existing code into a repository?

Unlike some tools, Mercurial has no special function for importing an external source tree. So just unpack the tree you want to import into the place you want it in the repository (make a temporary clone if you want to be cautious) and then use hg add. Again, don't forget to commit, and if you made a temporary clone, don't forget to push the resulting changeset back to your main clone and then update the main clone.

How do I see what's been changed?

Run hg status. This will show the status of the whole working tree by default, or you can run it on individual files or subdirectories. This will show which files have been modified; it will also show files that exist but Mercurial doesn't know about, and also any files that do not exist but should. In general the latter two cases should be rectified; files that exist but are not tracked by Mercurial should either be added or explicitly ignored, and files that are missing should either be restored or explicitly removed. Checking the status is useful before committing, and before updating, and when you first sit down to work to remind you where you were, and generally at any other time too.

How do I ignore stray files?

In most projects, compiling causes build products (.o files, for example) to appear in the tree. These will then show up with a ? when you run hg status, which is untidy and gets in the way of seeing real status information. To ignore files, create a file .hgignore in the top level directory of the project and add to it regular expressions matching the pathnames of the files you want Mercurial to ignore. For example,

   ~$
   \.o$
   ^build/
ignores all Emacs backup files and .o files anywhere, and anything underneath the top level directory called build. Mercurial can also ignore files using shell glob patterns (like *.o) instead of regular expressions; however, the regular expression method generally works better. See the Mercurial docs for more info.

How do I remove files?

When you wish to remove a file or directory from the tree, run hg rm on it. This will both delete the file itself and record the deletion in Mercurial. Commit at a suitable point afterwards.

It's usually a good idea to compile the project after removing but before committing, just to make sure you aren't breaking things.

Remember that files that have been deleted are still kept in the project history. They'll disappear from people's working trees by default, but you can still look at them, and you can bring them back again later if needed.

How do I add and remove directories?

You don't. Mercurial doesn't track directories as such; directories are created if there are files to put in them, and are removed automatically when the last file is removed. This makes dealing with directories almost entirely transparent.

How do I rename things?

Run hg mv. Like hg rm this both moves the file and records the move in Mercurial, and you should commit afterwards. Note that version history is preserved across the rename. You can also rename whole directories full of files with a single hg mv

How do I pull new changes?

Use hg pull. You can specify a repository to pull from, but if you don't, Mercurial will pull from the repository you cloned from, if any. You can change this, and add abbrevations for other repositories you commonly pull from (or push to), by editing the file .hg/hgrc at the top of your working tree. This is the Mercurial control file. See below for more information.

Can I check for new changes without pulling?

Yes: hg incoming.

How do I update my working files after pulling?

Use hg update. This will always update the whole tree to the latest version, merging any uncommitted changes you may have.

You can also update your working files to the state of the project as of a specific version by using the -r option with a version hash code or sequence number, or as of a particular date by using -d. This will also merge any uncommitted changes.

Note that you can update to an old version, make changes, and commit; this will create a new changeset that branches off from that old version. Sometimes this is useful.

How do I look at an old version without updating the whole tree?

Like this:

   % hg cat -rff3984e1 file

How do I commit my changes?

Use hg commit. You can commit directories or individual files. You can use the -m option to supply a commit message on the command line; if you don't, Mercurial will invoke the editor. Like with most commands, if you do not specify anything to commit explicitly, Mercurial commits all changes in the whole project.

  % hg commit foo.c -m "changes to foo"
or
  % hg commit src/kern -m "changes to kern"
Remember that changes that have not been committed, including adding and removing files, cannot be pushed out to other developers. Also, after committing, don't forget to push.

How do I push out my changes after committing?

Use hg push; this is exactly the same as hg pull except that it sends changesets in the other direction. Likewise, hg outgoing will show what you have to send out without actually doing it.

How do I set up commit messages to be mailed out?

If people are interested we can help set this up. Note that what you probably want is for the messages to be mailed out when you push changes to where your partner can get them, not when you first commit them.

How do I create a tag?

   % hg tag mytagname
This adds a line to the tags file .hgtags and (unlike other Mercurial operations) also implicitly commits the file. Note that adding a tag appears as its own changeset; this produces some oddities. In particular, the tag itself does not exist in the version the tag names; it cannot, because the hash identifying the tag changeset can't be computed until the contents (which include the hash of the changeset the tag names) are known. This is rarely a problem in practice but it appears odd the first time you run into it.

The version tagged will be the version that your working tree is based on, unless you supply options to name some other version. (And of course, uncommitted changes cannot be tagged.)

If you add the wrong tag or whatever you can edit and commit the .hgtags file manually. Sometimes you may get to merge the .hgtags file by hand too. If the same tag name appears more than once, the latest (lowest in the file) takes priority. Also, note that hg always uses the latest .hgtags file in the repository for looking up tags, regardless of the state of the working files.

How do I export a release?

Use hg archive. This is somewhat similar to hg clone, but produces a snapshot without a copy of the project history. Mercurial can prepare the snapshot in various forms, including already tarred up and compressed. Note that you must run hg archive from inside the repository you want to export from, and to avoid making a mess you should generally tell it to produce the snapshot somewhere that is outside the repository. For example:

   % hg archive -r asst4-final ~/asst4.tar.gz
This will generate a snapshot of the version previously tagged asst4-final, tarred up in a file called asst4.tar.gz in your home directory.

How do I make diffs?

Use hg diff. Specify the files or directory trees you wish to compare; if you do not specify anything, by default the whole project is diffed.

By default your working tree is diffed against the version in the project history to which it was last updated. You can diff against a specific version or tag by using the -r option, as with other commands. You can diff two specific versions by providing two such options.

If you want to see the latest changes that you've pulled but not yet updated, use -rtip as one of the arguments.

You can also provide many of the normal diff format options. The -w option causes diff to ignore whitespace changes; the -p option prints which C function each change hunk appears in. See the diff man page for more information, but note that in Mercurial the diff -uN options are on by default.

How do I find out where a particular line of code appeared?

The hg annotate command prints each line of the file with a prefix containing the sequence number of the version in which the line appeared. This number can be fed to hg log for more information. You can use the -r option to examine the file as it existed in any previous version.

This can be used in conjunction with hg diff to track down the history of individual lines of code, as long as they haven't moved around very much.

How do I look at the project history?

With hg log. By default it prints the short summary for every version in the project. To get the complete commit messages (and also the complete list of filenames modified, which can sometimes be large) use -v. You can look at specific revisions using -r. You can look at the diff (patch) for each revision using -p.

How do I back out a bad commit?

It's late at night and you foolishly/accidentally commit some immensely stupid change that breaks everything. (We've all been there; if you haven't yet, you will eventually.)

All is not lost. Part of the role of Mercurial is to keep track of old versions; you can extract the old version and re-commit it, or you can tell Mercurial to unmerge the change. Suppose you can determine that version aa4387471ea8 was the last "good" version of the code. You can return your code to that state using the following command:

   % hg revert -r aa4387471ea8
This updates your working files to the state associated with version aa4387471ea8, but (unlike hg update) it only changes the contents, not Mercurial's idea of what version you have. So the files will now show up as modified to hg status and doing hg commit will return things to the previous state.

If you did something else stupid, like committing with the wrong commit message, and you haven't pushed the resulting changeset yet, you can back out the last operation that modified the repository by doing

   % hg rollback
Note that you can only do this once, and it can't itself be undone. Read hg help rollback before trying this at home.

How do I move one of my repositories/clones?

Just move it. Nothing in a Mercurial repository cares where the directory tree it lives in is. If you have references to the repository location in hgrc files you will want to update those.

And how do I organize my clones?

When you're working alone, generally you'll use one master copy, or one copy each on several systems so you can work remotely. You might make temporary clones from time to time, but mostly there isn't any reason to set up lots of clones, particularly when you're taking a course and working on one assignment at a time. (In real life, you often have multiple things in progress at once on the same project, and then you typically want a separate clone for each.)

Once you start working with a partner, you also need to share versions with your partner. There are two basic ways to organize this. One is to pick a place to use as the master repository, whether it's in your home directory or your partner's, and then each clone that and push and pull from it.

The other is to each set up a (semi-)public repository that the other can read from (but not write to); then you push to your own public repository and pull from your partner's. In this environment it is often convenient to have, in addition to a main working repository, a clone of your partner's repository that you pull and update but never modify directly; this lets you look at the stuff your partner sends you (and perhaps compile and test it) before pulling it into your own working area.

Each of these models has advantages and disadvantages. Either way, be sure to set the repository permissions correctly so the rest of the class can't peek in. If in doubt about permissions, ask the course staff.

For more information

To get a list of Mercurial commands, you can type hg help; you can get the options for each command with hg help <command>, and there is also help available on certain other topics such as date and time strings. There are also man pages.

The Mercurial web site has quite a bit of documentation, including a project wiki. There is also a freely downloadable Mercurial book.


Adapted with permission from http://www.eecs.harvard.edu/~margo/cs161/web/resources/mercurial.html.