Here’s a fun thing:
The Analysis & Resynthesis Sound Spectrograph, or ARSS, is a program that analyses a sound file into a spectrogram and is able to synthesise this spectrogram, or any other user-created image, back into a sound.
Upon discovery of this juicy little tool the other day, Andy and I fell to discussing potential applications. We have a few USB cameras around the office for use with camstream, our little RabbitMQ demo, so we started playing with using the feed of frames from the camera as input to ARSS.
The idea is that a frame captured from the camera can be used as a spectrogram of a few seconds’ worth of audio. While the system is playing through one frame, the next can be captured and processed, ready for playback. This could make an interesting kind of hybrid between dance performance and musical instrument, for example.
We didn’t want to spend a long time programming, so we whipped up a few shell scripts that convert a linux-based, USB-camera-enabled machine into a kind of visual synthesis tool.
Just below is a frame I just captured, and the processed form in which it is sent to ARSS for conversion to audio. Here’s the MP3 of what the frame sounds like.
Each frame is run through ImageMagick’s “charcoal” tool, which does a good job of finding edges in the picture, inverted, and passed through a minimum-brightness threshold. The resulting line-art-like frame is run through ARSS to produce a WAV file, which can then be played or converted to mp3.
You will need:
* one Debian, Ubuntu or other linux computer, with a fairly fast CPU (anything newer than ca. 2006 ought to do nicely).
* a USB webcam that you know works with linux.
* a copy of ARSS, compiled and running. Download it here.
* the program “webcam”, available in Debian and Ubuntu with
, or otherwise as part of xawtv.
* “sox”, via
or the sox homepage.
or from ImageMagick.
The scripts are crude, but somewhat effective. Three processes run simultaneously, in a loop:
* webcam runs in the background, capturing images as fast as it can, and (over-)writing them to a single file, webcam.jpg.
* a shell script called grabframe runs in a loop, converting webcam.jpg through the pipeline illustrated above to a final wav file.
* a final shell script repeatedly converts the wav file to raw PCM data, and sends it to the sound card.
Here’s the contents of my ~/.webcamrc:
[grab] delay = 0 text = "" [ftp] local = 1 tmp = uploading.jpg file = webcam.jpg dir = . debug = 1
Here’s the grabframe script:
#!/bin/sh THRESHOLD_VALUE=32768 THRESHOLD="-black-threshold $THRESHOLD_VALUE" CHARCOAL_WIDTH=1 LOG_BASE=2 MIN_FREQ=20 MAX_FREQ=22000 PIXELS_PER_SECOND=60 while [ ! -e webcam.jpg ]; do sleep 0.2; done convert -charcoal $CHARCOAL_WIDTH -negate $THRESHOLD webcam.jpg frame.bmp mv webcam.jpg frame.jpg ./arss frame.bmp frame.wav.tmp --log-base $LOG_BASE --sine --min-freq $MIN_FREQ --max-freq $MAX_FREQ --pps $PIXELS_PER_SECOND -f 16 --sample-rate 44100 mv frame.wav.tmp frame.wav
You can tweak the parameters and save the script while the whole thing is running, to experiment with different options during playback.
To start things running:
* In shell number one, run “webcam”.
* In shell number two, run “while true; do ./grabframe ; done”.
* In shell number three, run “(while true; do sox -r 44100 -c 2 -2 -s frame.wav frame.raw; cat frame.raw; done) | play -r 44100 -c 2 -2 -s -t raw -”.
That last command repeatedly takes the contents of frame.wav, as output by grabframe, converts it to raw PCM, and pipes it into a long-running play process, which sends the PCM it receives on its standard input out through the sound card.
If you like, you can use esdcat instead of the play command in the pipeline run in shell number three. If you do, you can use extace to draw a spectrogram of the sound that is being played, so you can monitor what’s happening, and close the loop, arriving back at a spectrogram that should look something like the original captured images.