technology from back to front

Archive for February, 2011

A standard log file format?

mercurial-server logs every push and pull to the repository. These logs are not just informative, but part of the security it offers; a mercurial revision can trivially be attributed to any user, so if malicious code is added to a repository only the mercurial-server logs carry trustworthy information about the source. It’s therefore especially important that these logs be unambiguously parseable.

For versions up to and including 1.0, log entries looked like this:


2010-12-01_00:13:21 push key=lshift/alexander/mrnoisy changeset=b46d2312ee9ee4338d9a1a8585b2959afa178e6b
2010-12-01_10:10:22 pull key=lshift/paul/arizonabay changeset=b46d2312ee9ee4338d9a1a8585b2959afa178e6b

However, this format assumes that the “values” in the key-value pairs will never contain spaces or newlines, and two changes broke that. The first was that I lifted the rules which forbade most troublesome characters – like space, newline, or equals – from appearing in key names. The second was that I wanted to store more information in each message: in particular the SSH_CONNECTION environment variable, which also generally contains spaces. “I can’t be the first person in this situation”, I thought. “There will be a standard format for logging out there on the Internet, with libraries to write and parse it in more than one language; all I have to do is plug that format into my program and I’ll adhere to the standard.”

If it’s out there, I didn’t find it. It appears that in the forty years that we’ve been writing Unix programs, every single little program that needed to log something has cooked up its own format for doing so. If you want to actually do something with this logged information besides eyeball it, you’re going to have to write a new parser for each one. This time, I thought I’d try to write my logs in a format for which there were already parsers.

XML was obviously not an option; an XML document has to have a closing tag, but a log is constantly being appended to. Besides which, XML is very heavyweight, and poorly suited to non-textual information.

JSON is much lighter and better suited to general-purpose human-readable serialization; sadly, it has the same “closing tag” problem as XML. Making each line its own JSON document would solve the “closing tag” problem, but would then require that you had to hand-roll at least one part of your parser; trivial, but not as trivial as simply pointing a library at the log file and saying “read this”, and noticeably more difficult in for example C where all string handling is painful.

I thought about CSV, which has a quoting convention that allows it to store arbitrary strings; however, those quoting conventions are fairly weird and can include newline characters in the middle of a logical line. Also, CSV isn’t self-documenting the way JSON is, and can’t handle any more complex data structures than a list of strings.

YAML is a very flexible output format with some very nice examples showing how it can be used for logging. Unlike any of the others, YAML is most definitely appendable. However, Python doesn’t ship with a YAML parser, so if mercurial-server were to use that format it would depend on an extra library.

Or, as it turns out, not. YAML 1.2 is a superset of JSON. By using YAML format for the sequencing and JSON for everything else, I could produce a line-oriented log format that could contain everything I could throw at it. For versions of Mercurial from 1.1 on, the log file format looks like this:

– {“timestamp”: “2010-12-20_12:02:13 Z”, “nodes”: ["f800743d1c75ff925effb5ffa52dc974eff3f482"], “ssh_connection”: “192.168.23.3 55904 192.168.23.2 22″, “key”: “lshift/paul/arizonabay”, “op”: “push”}

YAML uses a hyphen at the start of a line to build a sequence, so with a YAML parser the whole thing appears to be a sequence of dictionaries. However, after the hyphen the line is pure JSON, so you can trivially write the format with only a JSON serializer, and parsing it with one is only a little harder. Also, newline characters only appear at the end of log entries, so if the program crashes mid-way through writing a log entry, re-synchronizing is easy.

I can’t claim that this is the log file format to end all log file formats. But it’s simple, self-documenting and comfortably human-readable, easy to write and parseable with existing tools. So if this isn’t the right thing, could someone please work out what is, so that applications like mercurial-server that don’t have particularly unusual needs can just stick to a common standard, rather than having us all hand-roll our own new invention for this common need?

by
Paul Crowley
on
28/02/11

Data visualisation: How weird is our jukebox?

One of the perennial unanswered questions among LShift hackers is “How weird is our jukebox?”. Not in the sense that it’s written in Erlang, but in the sense that we do play an awful lot of rather weird stuff on there.

The usual answer to this is “very”, but I’ve been thinking for a while about putting some numbers to it. The problem is that there’s no real standard for weirdness, nothing I can really use to establish which songs are odder than others (barring possibly some bleeding-edge music analysis, which isn’t really my field). I had an idea of something I could get at though that could serve as a stand-in: listener counts from last.fm.
Read more…

by
Tom Parker
on

ASP.Net MVC 2: mocking your HTTP layer

We recently started a project using Microsoft’s ASP.NET MVC 2 framework. Since I’m pretty big on test driven development, I immediately wanted to start unit testing the controllers. It turns out that it’s a bit harder than I thought: in production, MVC does a whole bunch of stuff before execution ever reaches your controllers. And if you naively just start applying TDD, you get all sorts of funky NullReferenceExceptions deep in the bowels of MVC.

Read more…

by
Frank Shearar
on
21/02/11

Embedded video and progressive download: A Quiz

I will provide you with two video files, video1.flv and video2.wmv, you need to embed them on the page and ensure that they use progressive download. Both video files are greater in size than 1GB so it will be obvious whether they are playing before they have completely downloaded. You will need to use the flash video player that I have provided for the flash video. Which one of the HTML snippets shown below should you use?

Snippet A

<object type="application/x-shockwave-flash" data="/player.swf" >
  <param name="movie" value="/player.swf"/>
  <param name="FlashVars" value="flv=/video1.flv"/>
</object>

<object type="video/x-ms-wmv">
  <param name="FileName" value="/video2.wmv"/>
</object>

Snippet B

<object type="application/x-shockwave-flash" data="/player.swf" >
  <param name="movie" value="/player.swf"/>
  <param name="FlashVars" value="flv=http://myserver.lshift.net/video1.flv"/>
</object>

<object type="video/x-ms-wmv">
  <param name="FileName" value="http://myserver.lshift.net/video2.wmv"/>
</object>

Read more…

by
tim
on
13/02/11

Squeak 4.2 released

Squeak 4.2 has finally shipped!

It continues the improvements started in 4.0 and 4.1, with the trunk model revitalising the community: a small group of dedicated, frequent committers provide the main thrust of development, supported by a very simple and lightweight way of providing bugfixes, enhancements, and the like.

Squeak 4.2 also ships with the Cog VM, Eliot Miranda’s much speedier virtual machine – most people report a 2x to 10x speed increase over the older VM.

Unfortunately for me, there’s a well-known issue with Squeak’s UUID plugin on 64-bit Linux machines, so if you’re running one of those, do yourself a favour and delete the UUIDPlugin: rm coglinux/lib/squeak/3.9-7/UUIDPlugin.

Read more…

by
Frank Shearar
on
12/02/11

Apache Camel and RabbitMQ

I’m evaluating Apache Camel for use on a client project, but we need to back it on to RabbitMQ. The AMQP component that comes with Camel is based on the Qpid 0.5.0 client which does not work too well with Rabbit, so this seemed a good excuse to experiment with custom Camel components.

There’s a first pass on GitHub for anyone who wants to play. Note the phrase “first pass” and that the Limitations section of the README file is longer than the Usage one.

by
Lee Coomber
on
04/02/11

Onzo now so live, you can buy it

I was delighted to see that Onzo have released a consumer version of their revolutionary energy metering product to the UK market. They have already been picking up design awards and once people realise what’s inside the good looking enclosure, they’ll surely pick up a host of green tech ones too.

Why is it revolutionary?

Unlike most smart meters that simply present energy consumption data, the Onzo contains a lot of “very clever” software that makes it much more useful as it interprets your data and your usage trends, and has the potential to do much more to help you practically reduce your energy consumption. Having helped Onzo design and build the software, we know it can do much more besides…

Congratulations Onzo! It’s been a long haul, but it’s worth it.

by
mike
on
01/02/11

Search

Categories

You are currently browsing the LShift Ltd. blog archives for February, 2011.

Feeds

Archives

2000-14 LShift Ltd, 1st Floor, Hoxton Point, 6 Rufus Street, London, N1 6PE, UK+44 (0)20 7729 7060   Contact us