technology from back to front

Archive for May, 2008

Does Clinton beat Obama?

Here’s the graph of polling error against probability of victory we saw before, updated for the latest polling data:


So if you assume that the polls will be out by 5% or less, then Obama’s chances of victory against McCain are over 60%, but they dip under 60% when you assume a larger polling error.

However, that accounts only for random error in the polls; we assume that each state’s final tally will differ from today’s polls by a normally distributed value which is totally independent in each state. In reality, there is likely to be a systematic, country-wide change in his standing against McCain between now and voting.

Suppose he gains one percentage point countrywide. We can simulate that by just adding 1% to his margin in each poll and re-running the test:


With 1% extra his chance of victory is now between 60% and 80%, depending on how big you think the random error is. Now let’s take off 1%:


and we see that if he loses 1% he has less than a 50% chance of winning the election. Finally let’s bracket that with estimates for +2% and -2%:


Here’s the same picture, going by Clinton’s poll results against McCain:


which shows that if we assume a 5% error in the polls (a rather low estimate) then Clinton can lose 2% of her popularity nationally and still be virtually guaranteed victory.

Here are the two superimposed:


We see that if we assume a 10% error in the polls, then Clinton can lose 2% of her national popularity and still be ahead of where Obama is now.

No-one is talking about Clinton’s apparent huge advantage over Obama in the Electoral College. Is there some systematic error in the polling? Are Rush Limbaugh fans claiming to be Clinton supporters, as per his encouragement, to throw the Democrats into confusion? Or should the few remaining uncommited supers be throwing themselves behind the losing nomination candidate in a desperate bid to keep the one who will guarantee them victory in November?

Paul Crowley

Managing servers with Puppet

So, I’ve been working on this project recently. In this project there’s no use of version control – in fact, we don’t even have staging or development environments. All changes were just made to the live server, in an ad hoc way, by a variety of people. And inevitably we’ve ended up in a situation where we don’t fully understand what we have, and we’re scared to change anything in case something else breaks.

Sounds awful, right? Chances are that you’ve been working on this project too. Of course I’m referring to system administration. (In fact I’m referring to Unix system administration since we don’t have many Windows servers; the rest of this post will be rather Unix-centric.)

Clearly this situation is not right. Over the years we’ve made a few attempts to improve things – the first attempt consisted of putting /etc in version control. This is clearly a step forward – you can see who’s changed what, when – but there are still several problems. For a start, anything that happens outside /etc is not recorded – so we don’t know which packages are installed or if any changes have been made to the rest of the system. For another thing, there’s no mechanism for abstraction – if you need to make one change to all your servers there’s no sensible way to express that.

So the second attempt consists of writing scripts to build servers, and putting them in version control instead. This is another step forward – look ma, we can do package management! We can do anything! There are still problems though. We can’t just write a script which runs once to build the server; things change and we need to use the same mechanism for maintenance. So although it’s a good feeling to be programming again, we don’t really want to use an imperative language (or even really a functional one) because it’s a pain to make our scripts idempotent (and worse still, to test that they are). And while abstraction is possible in this environment, it’s not exactly convenient.

So we arrive at our third attempt to fix things, and I have a good feeling about this one. I’ve recently been starting to use Puppet, and I think it solves all the problems mentioned so far, as well as some others. It allows all your system administration to be done in one place and automatically replicated out to all your servers. It uses a declarative language that’s designed for idempotence – you describe where you want to be, not how to get there. Abstraction between servers is really easy and quite powerful.

I’ve been going around the office with a messianic glint in my eye for the last couple of days, so I’ll try really hard to think about the negatives. The language has some magic quoting rules I’m not entirely happy with. The project is new-ish and the documentation could be better (although it could certainly be a lot worse too). But that’s about it.

I will say this though: after only a couple of days of work, I’ve got to a state where we can roll out some common foundations to all our servers (use LDAP authentication, use our Debian mirror, automatically apply security updates), and where a couple of our servers are entirely Puppetised – it’s a one liner to recreate them from our Mercurial repository.

That’s a really nice place to be. And you know what? It wasn’t even that hard.


Late-binding with Erlang

Upon browsing the source to the excellent MochiWeb, I came across a call to a function that, when I looked, wasn’t defined anywhere. This, it turns out, was a clue: Erlang has undocumented syntactic support for late-bound method dispatch, i.e. lightweight object-oriented programming!

The following example, myclass.erl, is a parameterized module, a feature that arrived undocumented in a recent Erlang release. Parameterized modules are explored on the ‘net here and here. (The latter link is to a presentation that also covers an even more experimental module-based inheritance mechanism.)

-module(myclass, [Instvar1, Instvar2]).
-export([getInstvar1/0, getInstvar2/0]).
getInstvar1() -> Instvar1.
getInstvar2() -> Instvar2.

“Instances” of the “class” called myclass can be created with myclass:new(A, B) (which is automatically provided by the compiler, and does not appear in the source code), where A and B become values for the variables Instvar1 and Instvar2, which are implicitly scoped across the entirety of the myclass module body, available to all functions defined within it.

The result of a call to a new method is a simple tuple, much like a record, with the module name in the first position, and the instance variable values in order following it.

Eshell V5.6  (abort with ^G)
1> Handle = myclass:new(123, 234).
2> Handle:getInstvar1().
3> Handle:getInstvar2().

While this looks really similar to OO dispatch in other languages, it’s actually an extension to Erlang’s regular function call syntax, and works with other variations on that syntax, too:

4> {myclass,123,234}:getInstvar1().

The objects that this system provides are pure-functional objects, which is unusual: many object-oriented languages don’t clearly separate the two orthogonal features of late-binding and mutable state. A well-designed language should let you use one without the other, just as Erlang does here: in Erlang, using parameterized modules for method dispatch doesn’t change the way the usual mechanisms for managing mutable state are used. “Instance variables” of parameterized modules are always immutable, and regular state-threading has to be used to get the effects of mutable state.

I’d like to see this feature promoted to first-class, documented, supported status, and I’d also very much like to see it used to structure the standard library. Unfortunately, it’s not yet very well integrated with existing modules like gb_sets, ordsets and sets. For example, here’s what happens when you try it with a simple lists call:

5> lists:append([1, 2], [3, 4]).
6> {lists, [1, 2]}:append([3, 4]).

Not exactly what we were after. (Although it does give brittle insight into the current internals of the rewrites the system performs: a {foo, ...}:bar(zot) call is translated into foo:bar(zot, {foo, ...}) – that is, the this parameter is placed last in the argument lists.)


A touchscreen mod for the Asus Eee 701

Tony and I bought Asus Eee PCs a couple of months ago, largely to experiment with. He’s been using his quite a bit, and had some fun installing Ubuntu. Mine has been languishing at home, waiting for its tooth on the cog to come around.

Recently, someone pointed me at a YouTube video of unrepentant modder jkkmobile fitting a touchscreen to his Eee PC. The cog turned, and I ordered a specialised touchscreen overlay kit from eBay they are quite easy to find.

There’s four bits to the kit: the touchscreen itself, which has a ribbon; a touchscreen controller PCB; and two cables, one from the ribbon to the PCB, and one from the PCB to USB. Some kits (like the one I bought) purport to be solderless; however, the cable meant to connect PCB to USB ended in the familiar A type plug, and had a choke (which we had a bit of fun destroying).

The overlay is a 4 wire resistive touchscreen panel, meaning that it works with a finger, a (capped) pen, a PDA stylus, or whatever. One just has to be careful not to use anything sharp or inky.

The first task was to take it all apart and determine where and how to accommodate the controller. Taking it apart consists mostly of two activities: unscrewing screws and popping the various clips in the case. A fiddly bit is taking the keyboard connector out, which requires especial delicacy because the housing is quite sticky and its easy to scratch the ribbon – this goes for the touchpad connector too.

As jkkmobile mentions, there’s a bit of space here and there for the controller. I didn’t want to sacrifice a speaker, as he did, and I wasn’t planning to do any more modding, so we used the space-of-least-effort, which is beside the memory housing on the underneath of the motherboard. We used a bit of insulation tape to separate the board from the motherboard; it took a few layers, since the pins on the underside wanted to poke through, and we had to trim the plug casings (other people have removed them altogether and soldered the cables on).

With the controller board in roughly the right place and cables plugged in, we routed the wires up through the gap between the motherboard and the case. It’s probably worth wrapping some tape around, so the edges don’t strip the sheath away.

The next choice was what to connect the USB endpoint to. USB cabling consists of four (sometimes five) wires: two for data, one (or two) earth, and one +5V power. On this, jkkmobile says that an external port would be fine except for the power lead, which ought to be connected to a power source that turns off when the PC does. I was happy to sacrifice one of the three external USB ports, so the earth, D+, and D- went to the pins of the lonesome port on the left-hand side of the case. tnkgirl helpfully provides a guide to the various USB traces on the motherboard, from which we choose a seeming spare (and verified its behaviour with a multimeter). We also had to check which colour wire corresponded to which pin, again with the multimeter this is left to vendors in the USB specification, though there is a convention.

Then there was soldering. At this stage I would like to point out that the last time either of Tony and I abused a soldering iron was under heavy supervision when we were about twelve. Nonetheless, we were game, and a couple of hours, countless Watts and a burnt finger bought us some shiny conductive solders. We ended up using some conducting epoxy to glue the earth to the port shielding, but it could equally (health and safety concerns aside) have been soldered to the pin.

The penultimate task was to attach the touchscreen and plug it in. I used double-sided tape along each edge of the LCD display. I left the backing on while I cleaned the screen one last time, then lined up the panel and pressed it on, with the ribbon at the bottom where there’s a bit of room out of the way of the camera. The panel overlapped the LCD by about 5mm at the left edge, so I had to trim the plastic clip on the fascia and cope with there being a bit of a bulge on that side.

The cable from the controller to the screen had to go from the keyboard part of the case to the screen part of the case, so I routed it through the hinge along with the camera and left-speaker wires. It took some persuasion, but there’s actually plenty of room to spare, both through the hinge and over the motherboard (the mic and speaker plug housings rise a good few mm above the board, for example) certainly enough for the cable to go through a couple of 90° bends. The ribbon from the panel is robust enough to be folded too, though perhaps don’t score it first.

While putting it all back together, we had a heart-stopping moment as it started refusing to boot past the initial firmware. At first we thought it might be the keyboard, and in particular the keyboard connector ribbon. In general these are often very sensitive, and known to be so, for the Eee. However, we are nothing if not scientific here at LShift, and we’d soon discovered that it was not the keyboard, but in fact some other problem that disappears when the motherboard is prodded in a certain way. After defouling some wiring and cycling its state of assembly, it booted consistently again.

Last came the software. The Xandros distribution happily recognises the USB controller, loads the usbtouchscreen module, and treats it as a mouse. However, it needs calibration, so it ends up being an insane teleporting mouse. A fair amount of interaction with Google led us to believe that the answer was to treat it as a different input device; that is, stop it looking like a mouse, and have a special input driver for it.

What normally happens is that it gets assigned something like /dev/mouse1, which is interleaved into /dev/input/mice, to which the X Window pointer input driver listens. To stop it looking like a mouse, it’s necessary to have a specific rule for it in /etc/udev/rules.d/:

KERNEL=="event*", SUBSYSTEM=="input", SYSFS{idVendor}=="0eef", SYSFS{idProduct}=="0001", SYMLINK+="input/evtouch_event"

(The values for idVendor and idProduct come from examining cat /proc/bus/usb/devices)

I compiled a specialised X input driver for it – to do that, I had to install gcc, and to do that I had to add repositories to /etc/apt/sources.list – and configured it as an input device.

To calibrate it, Tony wrote a tiny program to interpret the numbers from the event stream:

import sys
for line in sys.stdin.readlines():
  (d3, d4, d1, d2) = line[:4]
  hex = d1+d2+d3+d4
  print int(hex, 16)

and a few pipes and filters later we had our min and max values:

xxd /dev/input/evtouch_event | grep '0300 0000' | cut -d' ' -f8 | python | sort -n

( is the file with the Python from above. The ’0300 0100′ filters for the event type and the X axis; the Y axis is ’0300 0100′. I’ve doctored the original line used to be a bit simpler, so it may need some hacking).

I needed a bit of experimentation with the XSwap, YSwap and Rotate options in the input driver configuration in xorg.conf, with accompanying restarts of X, then it was done!

The resolution is such that it’s possible, with a bit of concentration, to browse web sites with a finger. A nice side-effect of the screen layout is that it’s possible to scroll a full-screen window by running a thumb along beside the bezel.


Monte Carlo model for Presidential elections

Given state-by-state polling of a given contest for the Presidency – Obama v McCain, say – it’s easy to put together who is predicted to win the contest according to the polls. Work out who wins each state according to the poll, add up the electoral votes, and whoever gets the most is the winner.

However, this could be quite a misleading impression. Polling numbers are not 100% accurate. Among other factors, there is statistical error in the polling process itself, and in the process of handling sampling biases; there is the unpredictability of turnout; and there is the straightforward change in voter intent between polling time and voting time. If all of the apparent winner’s states are won by tiny margins, while all of the loser’s states are substantial wins, then a proper analysis should show that the polling favours them more than a simple win/lose analysis might show. How can we take that into account?

Simulated elections are one approach. To start with, we assume that every poll has an unknown error term, which is normally distributed with a mean of zero and a standard deviation of x%. Then we run thousands of simulated elections. In each election, we guess the noise term by generating an appropriate random variable, and subtract this from the poll numbers. Then as before we give each state to the apparent winner and add up who gets the most EVs. By taking thousands of simulated elections, we can obtain a reasonable estimate of the probability that a given candidate will win given the poll data available to us.

There’s an “x” in the above paragraph; how much error should we assume in the polls? The only way to get a proper answer to this question would be to examine historical data; for the moment I’m punting on this and just looking at the different answers you can get from different estimates of the noise term.

The two lines cross around the 11% mark, so if you think that the final voting is likely to vary by 11% or more then the model shows Obama having a slight advantage, while for numbers less than that Clinton’s advantage is clear.

A few days ago, this graph was showing a crossover from Clinton to Obama at around the 6% mark, meaning that if you thought that today’s polls were likely to be out by 6% or more, you should vote for Obama. However, new polls in Michigan, Oregon, and Texas have changed the picture: both Oregon and Texas are now better for both candidates, while Michigan has moved from a win for Obama and a loss for Clinton to the exact opposite.

The moral of this story is probably that it is far too early to guess who will win the Presidential election. But it’s fun to try.

Data:, May 12. Tip of the hat to pylab for by far the nicest way I’ve found so far to generate graphs for publication.

Paul Crowley

Diff for Javascript, revisited

Last weekend I finally revisited the diff-in-javascript code I’d written a couple of years back, adding (very simple) patch-like and diff3-like functionality.

On the way, not only did I discover Khanna, Kunal and Pierce’s excellent paper “A Formal Investigation of Diff3“, but I found, the revision-control wiki, which I’m just starting to get my teeth into. I’m looking forward to learning more about merge algorithms.

The code I wrote last weekend is available: just download diff.js. The tools included:

* Diff.diff_comm – works like a simple Unix comm(1)
* Diff.diff_patch – works like a simple Unix diff(1)
* Diff.patch – works like a (very) simple Unix patch(1) (it’s not a patch on Wall’s patch)
* Diff.diff3_merge – works like a couple of the variations on GNU’s diff3(1)

Let’s try out a few examples. First, we need to set up a few “files” that we’ll run through the tools. The tools work with files represented as arrays of strings – each string representing one line in the file – so we’ll define three such arrays:

var base = "the quick brown fox jumped over a dog".split(/\s+/);
var derived1 = "the quick fox jumps over some lazy dog".split(/\s+/);
var derived2 =
  "the quick brown fox jumps over some record dog".split(/\s+/);

Examining base shows us that we have the right format:

js> uneval(base);
["the", "quick", "brown", "fox", "jumped", "over", "a", "dog"]

First, let’s run the subroutine I originally wrote back in 2006:

js> uneval(Diff.diff_comm(base, derived1));
[{common:["the", "quick"]},
 {file1:["brown"], file2:[]},
 {file1:["jumped"], file2:["jumps"]},
 {file1:["a"], file2:["some", "lazy"]},

The result is an analysis of the chunks in the two files that are the same, and those that are different. The same results can be presented in a terser differential format, similar to Unix diff:

js> uneval(Diff.diff_patch(base, derived1));
[{file1:{offset:2, length:1, chunk:["brown"]},
  file2:{offset:2, length:0, chunk:[]}},

 {file1:{offset:4, length:1, chunk:["jumped"]},
  file2:{offset:3, length:1, chunk:["jumps"]}},

 {file1:{offset:6, length:1, chunk:["a"]},
  file2:{offset:5, length:2, chunk:["some", "lazy"]}}]

Note the similarity of the results to the line-numbers and line-counts that diff(1) outputs by default.

The next example takes a diff-like patch, and uses it to reconstruct a derived file from the base file:

js> uneval(Diff.patch(base, Diff.diff_patch(base, derived1)));
["the", "quick", "fox", "jumps", "over", "some", "lazy", "dog"]

Finally, we attempt a diff3 style merge of the changes made in derived1 and derived2, hopefully ending up with a file without conflicts:

js> uneval(Diff.diff3_merge(derived1, base, derived2, true));
[{ok:["the", "quick", "fox", "jumps", "over"]},
 {conflict:{a:["some", "lazy"], aIndex:5,
            o:["a"], oIndex:6,
            b:["some", "record"], bIndex:6}},

We see that the result isn’t what we’d hoped for: while two regions are unproblematic, and merge cleanly, the algorithm couldn’t decide how to merge one part of the inputs, and has left the conflict details in the output for the user to resolve.

There are algorithms that give different (and usually better) results than diff3 – the paper I mentioned above explains some of the problems with diff3, and I’m looking forward to reading about what alternatives others have come up with at


Visualising Clinton v Obama has been tracking the “head-to-head” polls for Hillary Clinton and Barack Obama, to study the question of which of them stands the best chance of defeating John McCain in November. Rather than plot the data on a map, I thought that a scattergram was a more useful tool with which to understand this data, with Clinton’s margin of defeat/victory on the horizonal axis and Obama’s on the vertical. The area of each state reflects the number of votes in the Electoral College that state carries.

Clinton’s biggest asset here is Florida and Ohio, two states with 47 electoral votes between them which Obama polls as marginally losing but she comfortably wins. She’s also predicted to scrape Missouri’s ten electoral votes which are definitely out of reach for Obama, and there seems no doubt that she’ll win West Virginia (5 EVs) while Obama won’t – 62 EVs total. Her polled victory in Pennsylvania (21 EVs) is also by a much more comfortable margin – it’s very hard to see a Democrat losing PA and taking the country.

By comparison Obama only has three states in his corner; they only add up to 36 EVs, and none of them are comfortable wins for him, though they are all some way from Clinton’s reach. However, Obama has a lot of red states within reach; New Mexico, Nebraska, and South Carolina are the closest, but he can put up a good fight for Texas’s 34 electoral votes, which will be expensive for the Republicans to defend – and Obama will likely have a large financial advantage over them.

Indiana, meanwhile, is anyone’s.

Though either is more likely to win than lose, this chart predicts a much more comfortable victory for Clinton than for Obama. However, head-to-head polling doesn’t tell you everything: a candidate is chosen for their money-raising and get-out-the-vote capacity as well as for polling well. Oh, and I suppose some superdelegates might even care how well they’ll do the job.

The program is written in Python using Cairo, which is how I produce all my visualisations these days. The hardest part again was placing the labels for the states so that they don’t obscure each other.

Paul Crowley

E4X: Not as awful as I thought

Long, long ago, I complained about various warts and infelicities in E4X, the ECMAScript extensions for generating and pattern-matching XML documents. It turns out that two of my complaints were not well-founded: sequence-splicing is supported, and programmatic construction of tags is possible.

Firstly (and I’m amazed I didn’t realise this at the time, as I was using it elsewhere), it’s not a problem at all to splice in a sequence of items, in the manner of Scheme’s unquote-splicing; here’s a working solution to the problem I set myself:

function buildItems() {
  return <>
var doc = <mydocument>{buildItems()}</mydocument>;

You can even use real Arrays (which is what I tried and failed to do earlier), by guerilla-patching Array.prototype:

Array.prototype.toXMLList = function () {
    var x = <container/>;
    for (var i = 0; i < this.length; i++) {
    return x.children();
function buildItems() {
    return [<item>Hello</item>,
var doc = <mydocument>{buildItems()}</mydocument>;

Programmatic construction of tags is done by use of the syntax for plain old unquote, in an unusual position: inside the tag’s angle-brackets:

var tagName = "p";
var doc = <{tagName}>test</{tagName}>;

So in summary, my original expectation that E4X should turn out to be very quasiquote-like wasn’t so far off the mark. It’s enough to get the basics done (ignoring for the minute the problems with namespace prefixes), but it’s still a bit of a bolt-on afterthought; it would have been nice to see it better integrated with the rest of the language.


Ubuntu on EeePC is fairly slick

The instructions were pretty easy to follow (admittedly, after 10 years, you learn where the awkward spots are in linux installations) and the result is a tiny, snappy, fully-working Ubuntu machine, complete with webcam and wifi. The only bit I haven’t got working yet is microphone input to Skype; my bet is that it’s a simple mixer setting.

Update: This page has instructions on how to get the microphone working. The key piece of information is the “Capture Switch” settings.




You are currently browsing the LShift Ltd. blog archives for May, 2008.



2000-14 LShift Ltd, 1st Floor, Hoxton Point, 6 Rufus Street, London, N1 6PE, UK+44 (0)20 7729 7060   Contact us