Going distributed

A version control system goes Distributed by allowing multiple copies of the repository to exist, and work to be done in all those repositories in parallel. So when you start work on an existing project, the first thing to do is to get your own copy of the repository.

elsewhere$ hg clone ssh://jim.home.net/Poem Jim-Poem
updating working directory
1 files updated, 0 files merged, 0 files removed, 0 files unresolved

Mercurial lets you access other repositories via the file system, over http or over ssh.

elsewhere$ cd Jim-Poem
elsewhere$  hg log
changeset:   3:a065eb26e6b9
tag:         tip
user:        Jim Hague <jim.hague@acm.org>
date:        Thu Apr 24 18:52:31 2008 +0100
summary:     Rename my file

changeset:   2:ff97668b7422
user:        Jim Hague <jim.hague@acm.org>
date:        Thu Apr 24 18:50:22 2008 +0100
summary:     Finished first verse

changeset:   1:3d65e7a57890
user:        Jim Hague <jim.hague@acm.org>
date:        Wed Apr 23 22:49:10 2008 +0100
summary:     A great second line

changeset:   0:33596ef855c1
user:        Jim Hague <jim.hague@acm.org>
date:        Wed Apr 23 22:36:33 2008 +0100
summary:     My Pome

$

hg clone is aptly named. It creates a new repository that contains exactly the same changes as the source repository. You can make a clone just by copying your project directory, if you're confident nothing else will access it during the copy. hg clone saves you this worry, and sets the default push/pull location in the new repo to the cloned repo.

From that point, you use hg pull to collect changes from other places into your repo (though note it does not by default update your working copy), and, as you might guess, hg push shoves your changes into a foreign repository. By default these will act on the repository you cloned from, but you can specify any other repository.

More on those in a moment. First, though, I want to show you something you can't do in Subversion. Start with the repository with 4 changes we just cloned. I want to focus on the first couple of lines, so I'll wind the working copy back to the point where only those lines exist.

$ hg update -r 1
1 files updated, 0 files merged, 1 files removed, 0 files unresolved
$

And make a change.

$ hg diff
diff -r 3d65e7a57890 pome.txt
--- a/pome.txt  Wed Apr 23 22:49:10 2008 +0100
+++ b/pome.txt  Thu Apr 24 19:13:14 2008 +0100
@@ -1,2 +1,2 @@ There was a gibbon one morning
-There was a gibbon one morning
-said "I think I will fly to the moon".
+There was a baboon who one afternoon
+said "I think I will fly to the sun".
$ hg commit -m "Better first two lines"
$

The alert among you will have sat up at that. Well done! Yes, there's something very worrying. How can I commit a change at an old point? If you try this in Subversion, it will complain mightily about your file being out of date. But Mercurial just went ahead and did something. The Bazaar experts among you will know that in Bazaar, if you use bzr revert -r to bring the working copy to a past revision, make a change and commit, then your latest version will be the past revision plus your change. Perhaps that's what Mercurial did?

No. What Mercurial did is central to Mercurial's view of the world. You took your working copy back to an old changeset, and then committed a fresh change based at that changeset. Mercurial actually did just what you asked it to do, no more and no less. Let's see the initial evidence.

$ hg heads
changeset:   4:267d32f158b3
tag:         tip
parent:      1:3d65e7a57890
user:        Jim Hague <jim.hague@acm.org>
date:        Thu Apr 24 19:13:59 2008 +0100
summary:     Better first two lines

changeset:   3:a065eb26e6b9
user:        Jim Hague <jim.hague@acm.org>
date:        Thu Apr 24 18:52:31 2008 +0100
summary:     Rename my file

$

Time for some more Mercurial terminology. You can think of a head in Mercurial as the most recent change on a branch. In Mercurial, a branch is simply what happens when you commit a change that has as its parent a change that already has a child. Mercurial has a standard extension hg glog which uses some ASCII art to show the current state:

$ hg glog
@  changeset:   4:267d32f158b3
|  tag:         tip
|  parent:      1:3d65e7a57890
|  user:        Jim Hague <jim.hague@acm.org>
|  date:        Thu Apr 24 19:13:59 2008 +0100
|  summary:     Better first two lines
|
| o  changeset:   3:a065eb26e6b9
| |  user:        Jim Hague <jim.hague@acm.org>
| |  date:        Thu Apr 24 18:52:31 2008 +0100
| |  summary:     Rename my file
| |
| o  changeset:   2:ff97668b7422
|/   user:        Jim Hague <jim.hague@acm.org>
|    date:        Thu Apr 24 18:50:22 2008 +0100
|    summary:     Finished first verse
|
o  changeset:   1:3d65e7a57890
|  user:        Jim Hague <jim.hague@acm.org>
|  date:        Wed Apr 23 22:49:10 2008 +0100
|  summary:     A great second line
|
o  changeset:   0:33596ef855c1
   user:        Jim Hague <jim.hague@acm.org>
   date:        Wed Apr 23 22:36:33 2008 +0100
   summary:     My Pome

$

hg view shows a nicer graphical view9.

So the change is in there. It's the latest change, and is simply on a different branch to the other changes.

Almost invariably, you will want to bring everything back together and merge the branches. A merge is a change that combines two heads back into one. It prepares an updated working directory with the merged contents of the two heads for you to review and, if satisfactory, commit.

$ hg merge
merging pome.txt and poem.txt
0 files updated, 1 files merged, 0 files removed, 0 files unresolved
(branch merge, don't forget to commit)
$ cat poem.txt
There was a baboon who one afternoon
said "I think I will fly to the sun".
So with two great palms strapped to his arms,
he started his takeoff run.
$ hg commit -m "Merge first line branch"
$

(I'm no poet. The poem is, of course, Silly Old Baboon by the late, great, Spike Milligan. From A Book of Milliganimals, Puffin, 1971.)

Here's the ASCII art again showing what just happened. Oh, and notice in the above that Mercurial has done the right thing with regard to the rename.

$ hg glog
@    changeset:   5:792ab970fc80
|\   tag:         tip
| |  parent:      4:267d32f158b3
| |  parent:      3:a065eb26e6b9
| |  user:        Jim Hague <jim.hague@acm.org>
| |  date:        Thu Apr 24 19:29:53 2008 +0100
| |  summary:     Merge first line branch
| |
| o  changeset:   4:267d32f158b3
| |  parent:      1:3d65e7a57890
| |  user:        Jim Hague <jim.hague@acm.org>
| |  date:        Thu Apr 24 19:13:59 2008 +0100
| |  summary:     Better first two lines
| |
o |  changeset:   3:a065eb26e6b9
| |  user:        Jim Hague <jim.hague@acm.org>
| |  date:        Thu Apr 24 18:52:31 2008 +0100
| |  summary:     Rename my file
| |
o |  changeset:   2:ff97668b7422
|/   user:        Jim Hague <jim.hague@acm.org>
|    date:        Thu Apr 24 18:50:22 2008 +0100
|    summary:     Finished first verse
|
o  changeset:   1:3d65e7a57890
|  user:        Jim Hague <jim.hague@acm.org>
|  date:        Wed Apr 23 22:49:10 2008 +0100
|  summary:     A great second line
|
o  changeset:   0:33596ef855c1
   user:        Jim Hague <jim.hague@acm.org>
   date:        Wed Apr 23 22:36:33 2008 +0100
   summary:     My Pome

$

So, our little branch change has now been merged back, and we have a single line of development again. Notice that unlike the other changesets, changeset 5 has two parent changesets, indicating it is a merge changeset. You can only merge two branches in one operation; or putting it another way, a changeset can have a maximum of two parents.

This behaviour is absolutely central to Mercurial's philosophy. If a change is committed that takes as its starting point a change that already has a child, then a branch gets created. Working with Mercurial, branches get created frequently, and equally frequently merged back. As befits any frequent operation, both are easy to do.

You're probably thinking at this point that this making a commit onto an old version is a slightly strange thing to do, and you'd be right. But that's exactly what's going to happen the moment you go distributed. Two people working independently with their own repositories are going to make commits based, typically, on the latest changes they happen to have incorporated into their tree. To be Distributed, a DVCS has to deal with this. Mercurial faces it head-on. When you pull changes into your repo (or someone else pushes them), if any of the changes overlap -- are both based on the same base change -- you get extra heads, and it's up to you to let these extra heads live or merge, as you please.

In practice this is more manageable then you might think. Consider a typical Mercurial usage, where the 'master' repo sits on a known server, and everyone pulls changes from the master and pushes their own efforts to the master. By default Mercurial won't let you push if the receiving repo will gain an extra head as a result, so you typically pull (and do any required merging) just before pushing. Subversion users will recognise this pattern. Subversion won't let you commit a change if your working copy is not at the very latest revision, so the Subversion user will update, and merge if necessary, just before committing.

What, then, about a branch in the conventional sense of '1.0 maintenance branch'? Typically in Mercurial you'd handle this by keeping a separate cloned repository for those changes. Cloning is fast, and if local uses hard links where possible on filesystems that support them, so isn't necessarily extravagant on disc space. You can, if you prefer, handle them all in a single repo with 'named branches', but cloning is definitely simpler.

OK, so now you know the basics of using Mercurial. We can proceed to looking at how this magic is achieved. In particular, where does this magic globally unique identifier for a change come from?



Footnotes

... view9
Though, being Tcl/Tk based, not that much nicer.
Jim Hague 2009-05-22