Posts filed under 'Version control'
Many of my recent posts here have discussed the diff and diff3 code I wrote in Javascript. A couple of weekends ago I sat down and translated the code into Squeak Smalltalk. The experience of writing the “same code” for the two different environments let me compare them fairly directly.
To sum up, Smalltalk was much more pleasant than working with Javascript, and produced higher-quality code (in my opinion) in less time. It was nice to be reminded that there are some programming languages and environments that are actually pleasant to use.
The biggest win was Smalltalk’s collection objects. Where stock Javascript limits you to the non-polymorphic
for (var index = 0; index < someArray.length; index++) {
var item = someArray[index];
/* do something with item, and/or index */
}
Smalltalk permits
someCollection do: [:item | "do something with item"].
or, alternatively
someCollection withIndexDo:
[:item :index | "do something with item and index"].
Smalltalk collections are properly object-oriented, meaning that the code above is fully polymorphic. The Javascript equivalent only works with the built-in, not-even-proper-object Arrays.
Of course, I could use one of the many, many, many, many Javascript support libraries that are out there; the nice thing about Smalltalk is that I don’t have to find and configure an ill-fitting third-party bolt-on collections library, and that because the standard library is simple yet rich, I don’t have to worry about potential incompatibilities between third-party libraries, such as can occur in Javascript if you’re mixing and matching code from several sources.
Other points that occurred to me as I was working:
- Smalltalk has simple, sane syntax; Javascript… doesn’t. (The number of times I get caught out by the semantics of
this alone…!)
- Smalltalk has simple, sane scoping rules; Javascript doesn’t. (O, for lexical scope!)
- Smalltalk’s uniform, integrated development tools (including automated refactorings and an excellent object explorer) helped keep the code clean and object-oriented.
- The built-in SUnit test runner let me develop unit tests alongside the code.
The end result of a couple of hours’ hacking is an implementation of Hunt-McIlroy text diff (that works over arbitrary SequenceableCollections, and has room for alternative diff implementations) and a diff3 merge engine, with a few unit tests. You can read a fileout of the code, or use Monticello to load the DiffMerge module from my public Monticello repository. [Update: Use the DiffMerge Monticello repository on SqueakSource.]
If Monticello didn’t already exist, it’d be a very straightforward matter indeed to build a DVCS for Smalltalk from here. I wonder if Spoon could use something along these lines?
It also occurred to me it’d be a great thing to use OMeta/JS to support the use of
<script type="text/smalltalk">"<![CDATA["
(document getElementById: 'someId') innerHTML: '<p>Hello, world!</p>'
"]]>"</script>
by compiling it to Javascript at load-time (or off-line). Smalltalk would make a much better language for AJAX client-side programming.
July 1st, 2008
tonyg
After my talk on Javascript DVCS at the Osmosoft Open Source Show’n'tell, I went to visit Osmosoft, the developers of TiddlyWiki, to talk about giving TiddlyWiki some DVCS-like abilities. Martin Budden and I sat down and built a couple of prototypes: one where each tiddler is versioned every time it is edited, and one where versions are snapshots of the entire wiki, and are created each time the whole wiki is saved to disk.
| Regular DVCS |
SynchroTiddly |
| Repository |
The html file contains everything |
| File within repository |
Tiddler within wiki |
| Commit a revision |
Save the wiki to disk |
| Save a text file |
Edit a tiddler |
| Push/pull synchronisation |
Import from other file |
If you have Firefox (it doesn’t work with other browsers yet!) you can experiment with an alpha-quality DVCS-enabled TiddlyWiki here. Take a look at the “Versions” tab, in the control panel at the right-hand-side of the page. You’ll have to download it to your local hard disk if you want to save any changes.
It’s still a prototype, a work-in-progress: the user interface for version management is clunky, it’s not cross-browser, there are issues with shadow tiddlers, and I’d like to experiment with a slightly different factoring of the repository format, but it’s good enough to get a feel for the kinds of things you might try with a DVCS-enabled TiddlyWiki.
Despite its prototypical status, it can synchronize content between different instances of itself. For example, you can have a copy of a SynchroTiddly on your laptop, email it to someone else or share it via HTTP, and import and merge their changes when they make their modified copy visible via an HTTP server or email it back to you.
I’ve been documenting it in the wiki itself — if anyone tries it out, please feel free to contribute more documentation; you could even make your altered wiki instance available via public HTTP so I can import and merge your changes back in.
July 1st, 2008
tonyg
We’re using Mercurial here at LShift for much of our development work, now, and we’re finding it a great tool. We make heavy use of branches (”branch per bug”) for many projects, and this is also a pretty smooth experience. One issue that has come up is policy regarding merging the trunk (”default”) into any long-lived feature/bug branches: should you do it, or should you not?
My vote is that you should merge default into long-lived branches fairly regularly; otherwise, you have a big-bang, all-at-once nightmare of a merge looming ahead of you. If you do merge frequently, though, there’s one subtlety to be aware of: hg diff is not history aware, so in order to get an accurate, focussed picture of all the changes that have been made on your long-lived branch, you need to do one of two things:
- either, merge default into your long-lived branch right before you merge the long-lived branch back into default, and run
hg diff after that’s complete; or
- (recommended) do a throw-away test-merge of the long-lived branch into default directly.
Imagine a history like this:
(2) (3)
| |
\ /
V
|
(1)
… where (1) is an ancestral revision, (2) is the default branch, and (3) is the long-lived branch - let’s call it “foo”.
Given this history, running hg update -C default (to make the working copy be the default branch, i.e. revision (2)) followed by hg diff foo will give you a misleading diff - one that undoes the changes (1) to (2) before doing the changes from (1) to (3). This is almost certainly not what you want!
Instead, run a test merge, by hg update -C default followed by hg merge foo and then plain old hg diff. Note that this modifies your working copy! You will need to revert (by hg update -C default) if you decide the merge isn’t ready to be committed.
The output of hg diff after the hg merge shows a history-aware summary of the changes that the merge would introduce to your checked-out branch. It’s this history-awareness (”three-way merge”) that makes it so much superior to the history-unaware simple diff (”two-way merge”).
June 19th, 2008
tonyg
Yesterday I presented my work on Javascript diff, diff3, merging and version control at the Osmosoft Open Source Show ‘n Tell. (Previous posts about this stuff: here and here.)
The slides for the talk are here. They’re a work-in-progress - as I think of things, I’ll continue to update them.
To summarise: I’ve used the diff3 I built in May to make a simple Javascript distributed version-control system that manages a collection of JSON structures. It supports named branches, merging, and import/export of revisions. So far, there’s no network synchronisation protocol, although it’d be easy to build a simple one using the rev import/export feature and XMLHttpRequest, and the storage format and repository representation is brutally naive (and because it doesn’t yet delta-compress historical versions of files, it is a bit wasteful of memory).
You can try out a few browser-based demos of the features of the diff and DVCS libraries:
The code is available using Mercurial by hg clone http://hg.opensource.lshift.net/synchrotron/ (or by simply browsing to that URL and exploring from there). It’s quite small and (I hope) easily understood - at the time of writing,
- the diff/diff3 code and support utilities are ~310 lines; and
- the DVCS code is ~370 lines.
The core interfaces, algorithms and internal structures of the DVCS code seem quite usable to me. In order to get to an efficient DVCS from here, the issues of storage and network formats will have to be addressed. Fortunately, storage and network formats are only about efficiency, not about features or correctness, and so they can be addressed separately from the core system. It will also eventually be necessary to revisit the naive LCA-computation code I’ve written, which is used to select an ancestor for use in a merge.
The code is split into a few different files:
June 6th, 2008
tonyg
Last weekend I finally revisited the diff-in-javascript code I’d written a couple of years back, adding (very simple) patch-like and diff3-like functionality.
On the way, not only did I discover Khanna, Kunal and Pierce’s excellent paper “A Formal Investigation of Diff3“, but I found revctrl.org, the revision-control wiki, which I’m just starting to get my teeth into. I’m looking forward to learning more about merge algorithms.
The code I wrote last weekend is available: just download diff.js. The tools included:
Diff.diff_comm - works like a simple Unix comm(1)
Diff.diff_patch - works like a simple Unix diff(1)
Diff.patch - works like a (very) simple Unix patch(1) (it’s not a patch on Wall’s patch)
Diff.diff3_merge - works like a couple of the variations on GNU’s diff3(1)
Read on for some examples showing the library in action.
Continue Reading May 9th, 2008
tonyg
(Continued from Moving away from CVS)
The wealth of options for a replacement for CVS presents us with a problem. We can’t choose a version control system by comparing feature lists: what seems perverse when presented in the manual may become natural in real use (which is the reaction many have to CVS’s “merge-don’t-lock” way of working at first), and contrarily what seems attractive on paper may prove problematic in real use (the system may claim sophisticated merging, but will it actually do what you want given your version history?). Equally, however, trying to use every system in anger would impose a very serious cost: unless we write the infrastructure for every system we test, some live project will have to do without it while they try out the shiny new system, and for every system someone will have to undergo the considerable expense of really learning how to use it and make it behave well. So we have to find ways to at least thin the candidate list.
Continue Reading April 30th, 2008
Paul Crowley
When LShift first started off in 2000, the only real option for mature, open source version control was CVS. We’ve used CVS for most of our projects since then, and gone on to develop a strong infrastructure for managing CVS-backed projects, including a web interface for viewing versions, a web-based searchable database for related CVS commits (”CVSzilla”) which infers transactions from multiple simultaneous commits, and integration with the Bugzilla bug tracker.
Today, there are many other options, and I’ll discuss six major alternatives here: Subversion, Monotone, darcs, Git, Bazaar, and Mercurial.
Continue Reading April 24th, 2008
Paul Crowley