technology from back to front

TDD for Esoteric Programming Languages (using Clojure and Befunge)

I’ve been learning Clojure recently, and I’d been looking around for a good initial project to test my new knowledge. I’ve always wanted to write a Befunge interpreter, and so decided that sounded like a fun project. Little did I know the maze of twisty little passages I was letting myself in for, but I’ve learnt a lot of interesting things along the way, and there’s a few things worth sharing.

Befunge is an Esoteric programming language. To quote Wikipedia:

An esoteric programming language [...] is a programming language designed as a test of the boundaries of computer programming language design, as a proof of concept, or as a joke

Examples abound on the internet – LOLCODE, Brainfuck, Piet and INTERCAL (containing such infamous items as the COME FROM instruction).

Befunge is a two-dimensional, stack-based, self-modifying language. You start with a grid of characters in the top-left location going right, and then do things based on each character as you encounter it, much of which involve the stack. e.g. push an item on the stack, change direction based on the current top-of-stack value, etc. Net result of all of this is you have a 2-d program counter, and a 2-d “delta” which indicates the current direction of travel.

Hello world (well, one of the variants) looks like the following:

"!dlrow ,olleH":v
             v:,_@
             >  ^

One of the items to note here is that the series of “v” “^” and “>” characters act as a loop around the “,” which prints out the string “Hello, world!” which has been placed initially onto the stack in reverse order.

Befunge also has quite a long and detailed history. It was created in 1993 to start with, and the original (now called Befunge-93) spec was fairly limited (80×25 character grid only), but the later Befunge-98 spec removes the space limits, and adds in some features of more conventional languages including concurrency (there’s a “fork” instruction that sends the child going left and the parent right), and a form of libraries (in the form of “fingerprints” that add new instructions). There’s also many different interpreters/compilers (including one written in INTERCAL)

However the thing that got me doing this was the existence of the Mycology test suite. This starts off very simple, using a limited subset of the instructions to make sure some basic things work, and then the more advanced test suite which adds one feature at a time and tests them using the basic features. Also, the implementation itself is used to output the results (which works provided the basic instruction output did the right thing).

What this meant was that I could build a very limited interpreter, and only build new items as required. My initial interpreter did nothing more than read the test suite into memory, and then try and run the first instruction. As I hadn’t defined any instructions yet, it promptly failed, but I could then go back and implement an instruction and try again. For someone wanting to create a new interpreter for a language, this is amazingly useful.

Also, this meant I only had to implement more advanced features as and when needed. My initial version of the instruction grid used a contiguous array-of-arrays to store items, but when I needed to locate things at locations like (2^32)-1 by (2^32)-1, this became impractical. I then moved to a sparse list-of-lists, which I then had to derive algorithms for “jumping” through, as Befunge’s ” ” instruction does nothing and so is present everywhere anyone hasn’t added anything yet, but you want to be able to wrap-around (the instruction space is referred to as Lahey-Space) without taking a vast amount of time to skip over the large areas of empty space. This gets even more tricky when you stop having a program “delta” that just does simple things like “left” and “right” and starts looking more like a Knight’s move in chess…

So I’m now 2 months in, and I’ve got near-complete Befunge-98 compliance for Clostridium (Befunge things tend to be named after fungi and stuff, so I went for the Clostridium family of bacteria). Unfortunately, despite all the useful bits here, full compliance is very hard (the spec is ambiguous in various places, which never helps), and so I’ve decided it’s good enough for the moment. However it’s close enough to run quite a lot of things, and that’s good enough for me, and I’ve learnt a lot of Clojure along the way.

Future ideas include:

* Clostridium is probably capable of N-dimensional options as a side effect of the functional style of programming, but I’d need to add new instructions for >3 (Trefunge is the 3-dimensional variant).
* Clojurescript – get something viewable in a web browser.

by
Tom Parker
on
25/08/12
 
 


six − 4 =

2000-14 LShift Ltd, 1st Floor, Hoxton Point, 6 Rufus Street, London, N1 6PE, UK+44 (0)20 7729 7060   Contact us