Yesterday I visited Osmosoft, the developers of TiddlyWiki, to chat about getting some DVCS-like functionality into TiddlyWiki. Jeremy mentioned in passing that TiddlyWiki is, if you squint, a slightly cheating kind of a Quine. It struck me this morning that TiddlyWiki has strong similarities to another famous almost-Quine, the Smalltalk system.
TiddlyWiki is a composition of
a HTML page, with embedded Javascript, acting as a container for the pieces of content, called tiddlers, in the wiki;
some tiddlers, specially marked as plugins, containing source code permitting extension of the skeleton system; and
other tiddlers containing wiki markup text.
The HTML/Javascript container of the tiddlers (along with the web browser it runs on!) is like the Smalltalk virtual-machine. The tiddlers themselves are the live objects in the system. The process of running the embedded plugin scripts is the same as running Smalltalk’s post-image-load startup hooks. The plugins-in-tiddlers is the same as the source-in-the-image. The process of saving the TiddlyWiki instance to disk is the same as a Smalltalk instance extracting its live object graph and serializing it to disk.
The main difference I can see is that Smalltalk doesn’t carry so much of its VM around with its images: like Smalltalk’s reliance on an external VM being present, TiddlyWiki can rely on the browser being present for a big chunk of its VM, but has to carry the bootstrapping container code bundled with the stored tiddlers/live-objects. Also, TiddlyWiki instances are (these days!) constructed by a special assembly process from small, separate text files checked into Subversion, whereas Smalltalk images were not historically constructed from scratch very often at all. Finally, TiddlyWiki’s boot process is heavier than Smalltalk’s, because it’s forced by the browser to recompile all the sources in the system, where Smalltalk gets away with having bytecode in the image alongside the sources.
Yesterday I presented my work on Javascript diff, diff3, merging and version control at the Osmosoft Open Source Show ‘n Tell. (Previous posts about this stuff: here and here.)
The slides for the talk are here. They’re a work-in-progress - as I think of things, I’ll continue to update them.
To summarise: I’ve used the diff3 I built in May to make a simple Javascript distributed version-control system that manages a collection of JSON structures. It supports named branches, merging, and import/export of revisions. So far, there’s no network synchronisation protocol, although it’d be easy to build a simple one using the rev import/export feature and XMLHttpRequest, and the storage format and repository representation is brutally naive (and because it doesn’t yet delta-compress historical versions of files, it is a bit wasteful of memory).
You can try out a few browser-based demos of the features of the diff and DVCS libraries:
a demo of a Javascript DVCS, a bit like Mercurial, that manages a collection of JSON objects (presenting them in a file-like way, for the purposes of the demo).
The code is available using Mercurial by hg clone http://hg.opensource.lshift.net/synchrotron/ (or by simply browsing to that URL and exploring from there). It’s quite small and (I hope) easily understood - at the time of writing,
the diff/diff3 code and support utilities are ~310 lines; and
the DVCS code is ~370 lines.
The core interfaces, algorithms and internal structures of the DVCS code seem quite usable to me. In order to get to an efficient DVCS from here, the issues of storage and network formats will have to be addressed. Fortunately, storage and network formats are only about efficiency, not about features or correctness, and so they can be addressed separately from the core system. It will also eventually be necessary to revisit the naive LCA-computation code I’ve written, which is used to select an ancestor for use in a merge.
The code is split into a few different files:
The sources for the diff and diff3 demos and the DVCS demo. In the latter, check out the definition of presets.preset1 for an example of how to use the DVCS, and presets.ambiguousLCA for an example of the repository format and the use of the revision import feature.
Clinton will probably drop out of the race in the next few days, so let’s give the diagram showing both of them one last airing. This looks at a month’s worth of polling data to give a picture of how their relative chance of victory has changed over time - it’s an animated GIF, so you’ll need to have GIF animation enabled in your browser.
She’s moved from being a percentage point below Obama to two percentage points ahead of him. What changed so much over the course of May? My guess is simply that people who aren’t natural Democratic voters are more likely to feel warmth to Clinton the further the nomination gets from her grasp, and we’d be seeing the exact opposite picture if it were Clinton who was expecting the concession call any day now.
So, I’ve been working on this project recently. In this project there’s no use of version control - in fact, we don’t even have staging or development environments. All changes were just made to the live server, in an ad hoc way, by a variety of people. And inevitably we’ve ended up in a situation where we don’t fully understand what we have, and we’re scared to change anything in case something else breaks.
Sounds awful, right? Chances are that you’ve been working on this project too. Of course I’m referring to system administration. (In fact I’m referring to Unix system administration since we don’t have many Windows servers; the rest of this post will be rather Unix-centric.)
Clearly this situation is not right. Over the years we’ve made a few attempts to improve things - the first attempt consisted of putting /etc in version control. This is clearly a step forward - you can see who’s changed what, when - but there are still several problems. For a start, anything that happens outside /etc is not recorded - so we don’t know which packages are installed or if any changes have been made to the rest of the system. For another thing, there’s no mechanism for abstraction - if you need to make one change to all your servers there’s no sensible way to express that.
So the second attempt consists of writing scripts to build servers, and putting them in version control instead. This is another step forward - look ma, we can do package management! We can do anything! There are still problems though. We can’t just write a script which runs once to build the server; things change and we need to use the same mechanism for maintenance. So although it’s a good feeling to be programming again, we don’t really want to use an imperative language (or even really a functional one) because it’s a pain to make our scripts idempotent (and worse still, to test that they are). And while abstraction is possible in this environment, it’s not exactly convenient.
So we arrive at our third attempt to fix things, and I have a good feeling about this one. I’ve recently been starting to use Puppet, and I think it solves all the problems mentioned so far, as well as some others. It allows all your system administration to be done in one place and automatically replicated out to all your servers. It uses a declarative language that’s designed for idempotence - you describe where you want to be, not how to get there. Abstraction between servers is really easy and quite powerful.
I’ve been going around the office with a messianic glint in my eye for the last couple of days, so I’ll try really hard to think about the negatives. The language has some magic quoting rules I’m not entirely happy with. The project is new-ish and the documentation could be better (although it could certainly be a lot worse too). But that’s about it.
I will say this though: after only a couple of days of work, I’ve got to a state where we can roll out some common foundations to all our servers (use LDAP authentication, use our Debian mirror, automatically apply security updates), and where a couple of our servers are entirely Puppetised - it’s a one liner to recreate them from our Mercurial repository.
That’s a really nice place to be. And you know what? It wasn’t even that hard.
Upon browsing the source to the excellent MochiWeb, I came across a call to a function that, when I looked, wasn’t defined anywhere. This, it turns out, was a clue: Erlang has undocumented syntactic support for late-bound method dispatch, i.e. lightweight object-oriented programming!
The following example, myclass.erl, is a parameterized module, a feature that arrived undocumented in a recent Erlang release. Parameterized modules are explored on the ‘net here and here. (The latter link is to a presentation that also covers an even more experimental module-based inheritance mechanism.)
“Instances” of the “class” called myclass can be created with myclass:new(A, B) (which is automatically provided by the compiler, and does not appear in the source code), where A and B become values for the variables Instvar1 and Instvar2, which are implicitly scoped across the entirety of the myclass module body, available to all functions defined within it.
The result of a call to a new method is a simple tuple, much like a record, with the module name in the first position, and the instance variable values in order following it.
While this looks really similar to OO dispatch in other languages, it’s actually an extension to Erlang’s regular function call syntax, and works with other variations on that syntax, too:
4> {myclass,123,234}:getInstvar1().
123
The objects that this system provides are pure-functional objects, which is unusual: many object-oriented languages don’t clearly separate the two orthogonal features of late-binding and mutable state. A well-designed language should let you use one without the other, just as Erlang does here: in Erlang, using parameterized modules for method dispatch doesn’t change the way the usual mechanisms for managing mutable state are used. “Instance variables” of parameterized modules are always immutable, and regular state-threading has to be used to get the effects of mutable state.
I’d like to see this feature promoted to first-class, documented, supported status, and I’d also very much like to see it used to structure the standard library. Unfortunately, it’s not yet very well integrated with existing modules like gb_sets, ordsets and sets. For example, here’s what happens when you try it with a simple lists call:
Not exactly what we were after. (Although it does give brittle insight into the current internals of the rewrites the system performs: a {foo, ...}:bar(zot) call is translated into foo:bar(zot, {foo, ...}) - that is, the this parameter is placed last in the argument lists.)
Last weekend I finally revisited the diff-in-javascript code I’d written a couple of years back, adding (very simple) patch-like and diff3-like functionality.
On the way, not only did I discover Khanna, Kunal and Pierce’s excellent paper “A Formal Investigation of Diff3“, but I found revctrl.org, the revision-control wiki, which I’m just starting to get my teeth into. I’m looking forward to learning more about merge algorithms.
The code I wrote last weekend is available: just download diff.js. The tools included:
Diff.diff_comm - works like a simple Unix comm(1)
Diff.diff_patch - works like a simple Unix diff(1)
Diff.patch - works like a (very) simple Unix patch(1) (it’s not a patch on Wall’s patch)
Diff.diff3_merge - works like a couple of the variations on GNU’s diff3(1)
Read on for some examples showing the library in action.