Choosing a new version control system
(Continued from Moving away from CVS)
The wealth of options for a replacement for CVS presents us with a problem. We can’t choose a version control system by comparing feature lists: what seems perverse when presented in the manual may become natural in real use (which is the reaction many have to CVS’s “merge-don’t-lock” way of working at first), and contrarily what seems attractive on paper may prove problematic in real use (the system may claim sophisticated merging, but will it actually do what you want given your version history?). Equally, however, trying to use every system in anger would impose a very serious cost: unless we write the infrastructure for every system we test, some live project will have to do without it while they try out the shiny new system, and for every system someone will have to undergo the considerable expense of really learning how to use it and make it behave well. So we have to find ways to at least thin the candidate list.
We first narrow the list to the six candidates mentioned in the previous post: Subversion, Monotone, darcs, Git, Bazaar, and Mercurial. All of these have a sizeable community behind them and are used by popular projects. This means they have demonstrated themselves fit for purpose, and that there is a community who will provide help if we encounter problems, and code to support integration with other pieces of software. Other candidates may have interesting properties, but to choose them would be to be relatively out on our own; their lack of popularity also increases the risk that they will simply be abandoned after we have invested in them. In particular, this eliminates Codeville, the innovative DVCS designed by BitTorrent inventor Bram Cohen, though there seems little reason to pick it up in any case now that its main selling point, a smart text merging algorithm, has been picked up by Bazaar and could later be supported by some of the other systems if it is found to be usefully superior.
Of the six, the non-distributed Subversion is the first to be thrown out. This isn’t because we expect to benefit greatly from the possibility of disconnected operation, though it may prove useful sometimes; it is because we would like the other features of DVCSes described in the last article, in particular history-aware merging, and the general cleanliness of the underlying model. It’s a difficult decision, because Subversion has by far the best tool support of all of our candidates, including a mature Eclipse plugin; however, this is a decision we need to make based on the long-term future, and we anticipate that if we can pick a system that will remain popular then such support is just a matter of waiting for the tools to catch up.
The remaining five are very hard to choose between; I’ve had a hard time even finding discussion of how to choose one, because most articles focus on how each one is better than CVS or Subversion rather than comparing them to their DVCS peers. All are licensed under the GPL.
Monotone is the oldest of the five remaining candidates, and the first that I took an interest in. It has an attractively clean model of how a DVCS should work, and is in many ways the “most decentralized” of the five, because of the way it handles authentication. In any other DVCS, if I pull from your repository or allow you to push to mine, I am implicitly trusting you as a source of good revisions that I might like to build on. In Monotone, revisions are cryptographically signed, and it is these signatures that decide which revisions I will pay attention to; as a result, Monotone servers exchange not assertions but facts, and you don’t have to go to a particular server to get “authoritative” information on which is the right revision.
However, these signatures represent an unsolved management headache: how do you decide which keys to trust? As things stand, everyone has to update their keyring when a new developer joins the project. In February of last year, I attended a week-long Monotone developer’s summit in San Francisco hosted by Google and my sole personal goal while there was to find a better solution; I met a great many very very smart people and we had some fascinating discusions around the idea of “policy branches” to solve this problem, but we were never able to agree on exactly how such branches should work and as far as I know the problem is still unsolved.
Experiments with using Monotone internally showed other problems. Monotone repositories have a single global lock, so if for example a repository is made available in a web interface you can’t commit to it at the same time, a problem we were able to work around only with some very nasty hacks using multiple repositories. The same problem makes email notification hooks difficult to write, with the additional constraint that they must be written in an obscure interpreted language called Lua, and if more than one hook is to be run for the same event, the programmer must handle this themselves. Monotone itself is written in an eclectic style of C++ that makes it very hard to hack on or even understand what is happening internally. Finally, Monotone tends to be slow in normal use. Overall, we didn’t find working with Monotone to be an enjoyable experience, and we started looking at other candidates.
darcs has its supporters in this office. It’s written in Haskell, the statically typed pure functional programming language which had a place on our “Language du jour” whiteboard for much more than a day. It has by far the best support for “cherry-picking” (pulling in a change to a branch without pulling in all the changes that led to it) thanks to its “algebra of patches” that underlies its operation. However, this model is also what puts me off about it: it is very hard for darcs to cleanly support binary files, for example, because they aren’t well expressed by patches, and patches underlie every part of darcs including the storage and network formats; the other DVCSs have binary storage and network formats and consider the line-oriented nature of files only at merge time. To embed the assumption that all files are line-oriented text files so deeply into the architecture of a DVCS seems to me like a wrong turn that it would be very hard to back out of, so I kept looking.
That leaves three: Git, Bazaar, and Mercurial. All three date from around 2005, when Larry McVoy withdrew the limited license grant on his proprietary BitKeeper DVCS and the Linux kernel had to find a replacement in a hurry, a disaster for kernel development that vividly demonstrated the short-sightedness of Linus’s policy of trying to pretend that software licences don’t matter. All three have been chosen by major projects: Git is used most famously by the Linux kernel, Bazaar by Ubuntu’s Launchpad development centre, and Mercurial by the Java and Mozilla projects. A full evaluation of all three would be a fantastically costly exercise, so we had to use more superficial characteristics to decide which one to explore next.
Git is Linus’s own creation, started (I’m told) when Linus learned that the lead Monotone dev was on holiday and wasn’t about to start hacking on Monotone to improve performance until his return. To be sure, Git has very impressive performance, but there are several areas of concern: git has over a hundred subcommands betraying a lack of focus in interface design, and Win32 support (essential for us) is poor. In the end I felt I didn’t have faith in Git’s technical direction; I got the feeling that it was too wedded to a worse-is-better philosophy in which performance is more important than a clean model. To us this meant that it would take reports of crippling performance problems from other systems before we’d reassess Git.
The choice between Bazaar and Mercurial was in some ways the most arbitrary. Both are in Python, and both have a strong supporting community with lots of extensions – these two are not unrelated, as the choice of Python as implementation language lowers the barriers to getting involved. Each has a comparison page about the other, cross-linked, indicating their relative strengths, and updated as each draws features and ideas from the other or shoots ahead in an area it was formerly behind. There have even been joint Bazaar/Mercurial summit meetings hosted by Canonical, which didn’t result in either project subsuming the other but a rapid cross-fertilization of ideas. In the end I chose based on my feel for which had the clearest architectural vision, and based on the choices other projects have made, in particular projects which I felt would be good at making good choices, such as Java and Coyotos, and other LShift developers agreed: the choice was Mercurial.
Since then we’ve used Mercurial in anger for several projects, and done quite a bit of infrastructure work, integrating Mercurial with other tools that we use and otherwise making it more useful to us. So how’s it been working out for us? We’ll cover that in Part Three…