Here’s a fun thing:
The Analysis & Resynthesis Sound Spectrograph, or ARSS, is a program that analyses a sound file into a spectrogram and is able to synthesise this spectrogram, or any other user-created image, back into a sound.
Upon discovery of this juicy little tool the other day, Andy and I fell to discussing potential applications. We have a few USB cameras around the office for use with camstream, our little RabbitMQ demo, so we started playing with using the feed of frames from the camera as input to ARSS.
The idea is that a frame captured from the camera can be used as a spectrogram of a few seconds’ worth of audio. While the system is playing through one frame, the next can be captured and processed, ready for playback. This could make an interesting kind of hybrid between dance performance and musical instrument, for example.
We didn’t want to spend a long time programming, so we whipped up a few shell scripts that convert a linux-based, USB-camera-enabled machine into a kind of visual synthesis tool.
Just below is a frame I just captured, and the processed form in which it is sent to ARSS for conversion to audio. Here’s the MP3 of what the frame sounds like.
Each frame is run through ImageMagick’s “charcoal” tool, which does a good job of finding edges in the picture, inverted, and passed through a minimum-brightness threshold. The resulting line-art-like frame is run through ARSS to produce a WAV file, which can then be played or converted to mp3.
You will need:
* one Debian, Ubuntu or other linux computer, with a fairly fast CPU (anything newer than ca. 2006 ought to do nicely).
* a USB webcam that you know works with linux.
* a copy of ARSS, compiled and running. Download it here.
* the program “webcam”, available in Debian and Ubuntu with
apt-get install webcam, or otherwise as part of xawtv.
* “sox”, via
apt-get install sox or the sox homepage.
apt-get install imagemagick or from ImageMagick.
The scripts are crude, but somewhat effective. Three processes run simultaneously, in a loop:
* webcam runs in the background, capturing images as fast as it can, and (over-)writing them to a single file, webcam.jpg.
* a shell script called grabframe runs in a loop, converting webcam.jpg through the pipeline illustrated above to a final wav file.
* a final shell script repeatedly converts the wav file to raw PCM data, and sends it to the sound card.
Here’s the contents of my ~/.webcamrc:
[grab] delay = 0 text = "" [ftp] local = 1 tmp = uploading.jpg file = webcam.jpg dir = . debug = 1
Here’s the grabframe script:
#!/bin/sh THRESHOLD_VALUE=32768 THRESHOLD="-black-threshold $THRESHOLD_VALUE" CHARCOAL_WIDTH=1 LOG_BASE=2 MIN_FREQ=20 MAX_FREQ=22000 PIXELS_PER_SECOND=60 while [ ! -e webcam.jpg ]; do sleep 0.2; done convert -charcoal $CHARCOAL_WIDTH -negate $THRESHOLD webcam.jpg frame.bmp mv webcam.jpg frame.jpg ./arss frame.bmp frame.wav.tmp --log-base $LOG_BASE --sine --min-freq $MIN_FREQ --max-freq $MAX_FREQ --pps $PIXELS_PER_SECOND -f 16 --sample-rate 44100 mv frame.wav.tmp frame.wav
You can tweak the parameters and save the script while the whole thing is running, to experiment with different options during playback.
To start things running:
* In shell number one, run “webcam”.
* In shell number two, run “while true; do ./grabframe ; done”.
* In shell number three, run “(while true; do sox -r 44100 -c 2 -2 -s frame.wav frame.raw; cat frame.raw; done) | play -r 44100 -c 2 -2 -s -t raw -”.
That last command repeatedly takes the contents of frame.wav, as output by grabframe, converts it to raw PCM, and pipes it into a long-running play process, which sends the PCM it receives on its standard input out through the sound card.
If you like, you can use esdcat instead of the play command in the pipeline run in shell number three. If you do, you can use extace to draw a spectrogram of the sound that is being played, so you can monitor what’s happening, and close the loop, arriving back at a spectrogram that should look something like the original captured images.
Google’s latest open-source initiative: Protocol Buffers. Like XML, this is a generic language-neutral hierarchical container format, but unlike XML it is a binary format designed for low size overhead and fast reading and writing; in this it resembles some other initiatives to that end, including Facebook’s Thrift. I am keen to see a good binary alternative to XML become popular and there are several clever and attractive aspects to PB’s design, but there are several shortcomings in the wire format, and I don’t think it’s hard to design a better alternative.
A PB file consists of a sequence of numbered fields; applications that want to use PB write protocol description (“.proto”) files describing what these fields mean. One of the goals of PB is that old consumers should be able to read the data produced by newer producers, by simply skipping unexpected fields. This means in turn that consumers have to be able to find the end of a field without needing any out-of-band metadata about the field. Take a look at the wire format documentation to see how PB achieves that – seven-bit integer encoding, ZigZag, wire types, and embedded messages. Each of these is a sacrifice in both cleanliness and performance.
The “varint” encoding, that writes integers in seven bit groups with the high bit as a stop bit, is inefficient both for reading and writing; you can’t just send the bytes you have to the wire, you have to measure them out and check when you’ve finished with shifts; each check can also cause pipeline stalls. ZigZag is even worse – it shifts the whole number up in order to put the sign bit in the least significant bit, so there’s more work to do once the number is assembled. Wire types are a nasty wart; they’re not enough information to actually know what is being sent, and they are there only so that the length of what is being sent can be inferred properly where a field is unknown (and to work around the inefficiency of the “varint” encoding by providing special fixed-length integer encodings). And the only non-deprecated way of creating hierarchy is to embed a message as a string – which means that every data structure below the top level must be completely marshalled in memory before it is sent, so that its length can be measured.
So here’s my alternative proposal – same logical structure, different wire format. We encode the data as a stream of “atoms”. An atom is either a hunk of binary data, or a “special symbol”. Here’s how we decode an atom:
def readIntOrLen(lim, stream, i): if i < lim: return i return toUint(stream.read(1 + i - lim)) # Could force eg one-byte encoding to cover range [lim, lim+255] # instead of [0, 255] here but is it worth it? def readAtom(stream): firstByte = toUint(stream.read(1)) fieldId = readIntOrLen(12, stream, firstByte >> 4) if fieldId == 0: return specialSymbol(firstByte & 0x0f) dataLen = readIntOrLen(8, stream, firstByte & 0x0f) return fieldId, stream.read(dataLen)
So we make the head byte do as much work as possible; it has one nybble for the field ID, one nybble for the data length, and no “wire type”. If the nybble is too small, we overflow into the rest of the data stream, using as few bytes as possible but encoding in a natural 8-bit little-endian format. For structures that have fewer than 12 members, this adds only one byte of overhead on any field which is fewer than 8 bytes long. Right away, that’s 7-bit-footed varints, ZigZag, and wire types dispensed with. We’ve reserved fieldId 0 for special symbols; we can encode 16 different symbols.
To avoid the pre-marshalling problem, we re-introduce the grouping symbols that Google deprecated. A “start group” symbol should be followed by a field ID; this will normally be written in the following byte, but values above 251 will take up more bytes: in other words we read the group ID with
readIntOrLen(252, stream, toUint(stream.read(1))). This is then paired with an “end symbol”. Obviously these symbols act like brackets in SPKI and can be nested. So you can start writing a nested data structure just by emitting the start symbol, marshalling the structure directly to the output stream, and then emitting the end symbol; no pre-marshalling needed.
This format fixes the four problems I identified with the format, but we still have 14 special symbols left, and I see nice things to do with them.
Let specialSymbol(0) be NOP. Note that this is just the same as a zero byte, so now runs of zero bytes inbetween proper data has no effect. On disk, you can “zero out” bits of data you don’t need without having to copy up all the other data to take its place; or you can leave zero blocks in useful places (eg at the head of a file) where you might want to write real data later.
Let’s have “begin chunked” and “end chunked” symbols. All atoms in between should be hunks of binary data with the same field ID; the symbols mean these should be concatenated to get the final answer. Now you can start writing even an individual binary field before you know how long it’s going to be, useful if you’re generating it on the fly.
Let’s have a “begin metadata” symbol, which should be followed by a group; the group ID can identify what sort of metadata. Parsers must ignore metadata they don’t understand, but must fail if they see special symbols they don’t understand. We can put for example the protocol description itself in the metadata, suitably encoded, so that smart clients can use it eg to display the data meaningfully.
Let’s have a “version” symbol, which is followed by a version byte. If the version is unknown, the parser must fail. We should choose these symbols carefully, so that the first two bytes of the file can be an appropriate magic number not used by other formats.
We can trivially define a “canonical encoding” for signing: it is precisely the shortest encoding for the data given, with fields in sorted order by fieldID.
This seems to me to be nearly always better than Google’s proposal – are there angles I’ve missed?
Update: I wasn’t happy with the two-byte overhead for eight-byte fields; here’s an encoding that has a one-byte overhead for fields of 12 bytes or fewer.
def readIntOrLen(lim, stream, i): if i < lim: return i return toUint(stream.read(1 << (i - lim))) # Could force eg one-byte encoding to cover range [lim, lim+255] # instead of [0, 255] here but is it worth it? def readAtom(stream): firstByte = toUint(stream.read(1)) fieldId = readIntOrLen(13, stream, firstByte >> 4) if fieldId == 0: return specialSymbol(firstByte & 0x0f) dataLen = readIntOrLen(13, stream, firstByte & 0x0f) return fieldId, stream.read(dataLen)
This limits length descriptors to being 0, 1, 2, or 4 bytes long. Note that this limits individual chunks to 4GB. If you’re sending disk images, use the chunked encoding; the canonical encoding rules will need a special case for this.
Speaking of OMeta/JS and OMeta in general, I’ve implemented an OMeta for Scheme. Currently it runs in MzScheme, but it should be fairly portable, with dependencies only on a handful of commonly-implemented SRFIs. I intend to properly libraryise it — making it into a proper MzScheme module — and to port it to other schemes, most probably starting with SISC.
One interesting feature of this OMeta is that it implements the error-handling mechanisms suggested by Bryan Ford that I implemented previously in a packrat parsing library for Scheme. The packrat-based error-handling techniques seem to generalise fairly well to an OMeta setting.
* Browse the code here.
* Check it out with
hg clone http://www.eighty-twenty.org/hgwebdir.cgi/ometa-scheme/.
Update: Switched from hosting the ometa-scheme code in Darcs to Mercurial.
Other points that occurred to me as I was working:
The end result of a couple of hours’ hacking is an implementation of Hunt-McIlroy text diff (that works over arbitrary SequenceableCollections, and has room for alternative diff implementations) and a diff3 merge engine, with a few unit tests. You can read a fileout of the code, or use Monticello to load the DiffMerge module from my public Monticello repository. [Update: Use the DiffMerge Monticello repository on SqueakSource.]
If Monticello didn’t already exist, it’d be a very straightforward matter indeed to build a DVCS for Smalltalk from here. I wonder if Spoon could use something along these lines?
It also occurred to me it’d be a great thing to use OMeta/JS to support the use of
<script type="text/smalltalk">"<![CDATA[" (document getElementById: 'someId') innerHTML: '<p>Hello, world!</p>' "]]>"</script>
|Repository||The html file contains everything|
|File within repository||Tiddler within wiki|
|Commit a revision||Save the wiki to disk|
|Save a text file||Edit a tiddler|
|Push/pull synchronisation||Import from other file|
If you have Firefox (it doesn’t work with other browsers yet!) you can experiment with an alpha-quality DVCS-enabled TiddlyWiki here. Take a look at the “Versions” tab, in the control panel at the right-hand-side of the page. You’ll have to download it to your local hard disk if you want to save any changes.
It’s still a prototype, a work-in-progress: the user interface for version management is clunky, it’s not cross-browser, there are issues with shadow tiddlers, and I’d like to experiment with a slightly different factoring of the repository format, but it’s good enough to get a feel for the kinds of things you might try with a DVCS-enabled TiddlyWiki.
Despite its prototypical status, it can synchronize content between different instances of itself. For example, you can have a copy of a SynchroTiddly on your laptop, email it to someone else or share it via HTTP, and import and merge their changes when they make their modified copy visible via an HTTP server or email it back to you.
I’ve been documenting it in the wiki itself — if anyone tries it out, please feel free to contribute more documentation; you could even make your altered wiki instance available via public HTTP so I can import and merge your changes back in.
Update: Because it depends on a newer release of RabbitMQ than 1.3.0, you will also need to check out the server and codegen code from our public mercurial repositories, or download them as snapshots: server, codegen.
By bridging between the two systems, we benefit from:
* XMPP’s internet-scale addressing and federation
* XMPP’s presence model
* AMQP’s store-and-forward capability
* AMQP’s routing and filtering (using exchanges and bindings)
The current implementation is a very simple mapping between the two systems. Its simplicity keeps the code short, but only exposes a subset of AMQP features to the XMPP network, and vice versa.
You are currently browsing the LShift Ltd. blog archives for July, 2008.