Posts filed under 'Erlang'
Erlang represents strings as lists of (ASCII, or possibly iso8859-1) codepoints. In this regard, it’s weakly typed - there’s no hard distinction between a string, “ABC”, and a list of small integers, [65,66,67]. For example:
Eshell V5.5.4 (abort with ^G)
1> "ABC".
"ABC"
2> [65,66,67].
"ABC"
3>
Erlang also has a binary type, a simple vector of bytes. In the rfc4627/JSON codec I made for Erlang, I chose to use binaries to represent decoded strings, as suggested by Joe Armstrong.
All was well - until I came to implement UTF8 support after Sam Ruby got the ball rolling. Binaries will no longer work as the chosen mapping for JSON strings, since strings may contain arbitrary characters, including those with codepoints greater than 255.
It has always been the case that the ideal representation for a JSON string is an Erlang string, a list of codepoints. Binaries are really a bit of a compromise. But choosing strings-for-strings puts us straight back in a weakly-typed position: it’s possible in JSON to distinguish between “ABC” and [65,66,67], but it’s not possible to make the same distinction in Erlang. We’d need to alter the way JSON arrays are represented to compensate.
Possible solutions:
- Map strings to lists of codepoints. Map arrays to tuples rather than lists. Objects remain {obj,[…]}.
- Pros: Terse syntax for strings and arrays, no worse than the Unicode-ignorant mapping
- Cons: Awkward recursion over arrays, either using a counter and the element/2 BIF, or converting to a real list
- Map strings to binaries containing UTF-8 encoded characters. Keep arrays as lists. Objects remain {obj,[…]}.
- Pros: Keep terse syntax for strings, with the understanding that the binaries concerned must hold UTF8-encoded text. Keeps the interface largely unchanged.
- Cons: Codec needs to perform possibly-redundant Unicode encoding/decoding steps to ensure that the binaries hold UTF8 even if, say, UTF32 were the format to be used on the wire
- Map strings to lists of codepoints. Map arrays to {arr,[…]}, as other JSON codecs do. Objects remain {obj,[…]}.
- Pros: Natural operations on strings, natural operations on arrays (once you strip the outer {arr,…}).
- Cons: Converting terms to JSON-encodable form is a pain, since you need to wrap each array in your term with the explicit marker atom.
All in all, I can’t decide which is the least distasteful option. I think I prefer the middle option, keeping strings mapped to binaries and viewing them as UTF-8 encoded text, but I really need to get some feedback on the issue.
September 13th, 2007
tonyg
I am guest blogging here on behalf of CohesiveFT. We work with the excellent LShift team on our joint venture, RabbitMQ.
I’m here to invite you to a Birds of a Feather session this coming Thursday, August 30th, at 8pm, in central London. It is FREE and will last for 45 minutes starting at 8pm, followed by the traditional breakout discussions over a beer.
Please do take a look at RabbitMQ if you have not yet done so. It’s a commercial open source product, available under the MPL 1.1 and implementing the Advanced Message Queue Protocol. AMQP is a new way to do business messaging (ie: “what goes in, must come out“). What’s really cool is that like HTTP it is a protocol instead of a language specific API. This should make interoperability between platforms much easier and less painful (business readers: “systems integration projects take less time and success can be
predicted more accurately”). For more information, please see my list of links
here.
What is the BOF about - and why come? It’s an informal session about RabbitMQ and AMQP, and how they apply within popular environments such as Spring, Mule, Ruby,
AJAX, and other messaging protocols such as FIX.
“Informal” means we’ll be encouraging a conversation between people interested in any of these things. We want to hear from you, and from each other, rather than pushing slideware at people.
Come if you want to:
You can find out details of the BOF here. Ideally we ask you to register via the web site, but late arrivals are very welcome - if you turn up, we shall get you in. The BOF is offered as part of the popular EJUG series of tech talks and as a
tie-in with the most excellent No Fluff Just Stuff conference.
If you cannot come but want to know more about any of these things then you can email us at info@rabbitmq.com.
Thank-you very much - and we hope to see you on Thursday :-)
Posted by Chris on behalf of Alexis Richardson, CohesiveFT.
August 28th, 2007
chris
Our jukebox (mentioned previously) received an update yesterday.
To download the code,
There is a little bit of documentation available, and you can browse the code.
June 21st, 2007
tonyg
About a month ago, I wrote an implementation of RFC 4627, the JSON RFC, for Erlang. I also implemented JSON-RPC over HTTP, in the form of mod_jsonrpc, a plugin for Erlang’s built-in inets httpd. This makes accessing Erlang services from in-browser Javascript very comfortable and easy indeed.
Continue Reading February 17th, 2007
tonyg
RFC 1982 defines a “Serial Number Arithmetic”, for use when you have a fixed number of bits available for some monotonically increasing sequence identifier, such as the DNS SOA record serial number, or message IDs in some messaging protocol. It defines all its operations with respect to some power of two, (2^SERIAL_BITS). It struck me just now that there’s no reason why you couldn’t generalise to any number that simply has two as a factor. You’d simply replace any mention of (2^SERIAL_BITS) by, say, N, and any mention of (2^(SERIAL_BITS-1)) by (N/2). The definitions for addition and comparison still seem to hold just as well.
One of the reasons I was thinking along these lines is that in Erlang, it’s occasionally useful to model a queue in an ETS table or in a process dictionary. If one didn’t mind setting an upper bound on the length of one’s modelled queue, then by judicious use of RFC 1982-style sequence number wrapping, one might ensure that the space devoted to the sequence numbering required of the model remained bounded. Using a generalised variant of RFC 1982 arithmetic, one becomes free to choose any number as the queue length bound, rather than any power of two.
February 17th, 2007
tonyg
We’re proud to announce that the project we’ve been working on for the past few months, RabbitMQ, has been released. RabbitMQ is an AMQP server written using Erlang/OTP. Check it out at http://www.rabbitmq.com/ - or you can go straight to the downloads page for sources and binaries.
February 1st, 2007
tonyg

Sometime around the beginning of July I rewrote our internal jukebox
in Erlang. It’s taken me four months to get a round tuit, but new
stock has just arrived: here’s the code for our AJAX jukebox
web-application, as a tarball. (There’s
also a darcs repository:
darcs get http://www.lshift.net/~tonyg/erlang-jukebox/.)
Click on the image for a screenshot.
To run it, you will need Erlang,
Yaws (the Erlang webserver), a modern browser, mpg123, ogg123 (from
vorbis-tools), and some MP3 or OGG files to listen to.
I’ve made a start on a bit of documentation
and design rationale. Here are a few highlights for the curious:
You point the jukebox at one or more root URLs, which it then
spiders, collecting URLs for MP3 and OGG files, which it puts into a
simple flat-file database. Just expose, say, your iTunes folder via Apache, point the Jukebox at it,
and you’re away.
It relies on mpg123 and ogg123’s support for playing HTTP-streamed
MP3 and OGG files, respectively, rather than retrieving or playing
the media itself.
The user interface is completely written in HTML+Javascript, using
prototype for its event
binding and XMLHttpRequest
support.
The server side of the application communicates with the user
interface solely via JSON-RPC.
Erlang made a great platform for the server side of the
application. Its support for clean, simple concurrency let me design
the program in a very natural way.
As part of the development of the program, I built a few stand-alone
modules that others might be interested in reusing:
[Update: fixed an issue with json.js, tweaked the use of screen real-estate, and now seems to work with Safari, IE6, and Opera. I’ve changed the tarball link above to point to the new version.]
[Update: fixed a couple of links that had broken over time as the darcs repository evolved.]
November 6th, 2006
tonyg
Earlier today I ran a simple test of Erlang’s process creation and teardown code, resulting in a rough figure of 350,000 process creations and teardowns per second. Attempting a similar workload in Java gives a figure of around 11,000 thread creations and teardowns per second - to my mind, a clear demonstration of one of the main advantages of Erlang’s extremely lightweight processes.
Here’s the Java code I used - see the earlier post for the Erlang code, to compare:
// Java 5 - uses a BlockingQueue.
import java.util.concurrent.*;
public class SpawnTest extends Thread {
public static void main(String[] args) {
int M = Integer.parseInt(args.length > 0 ? args[0] : "1");
int N = Integer.parseInt(args.length > 1 ? args[1] : "1000000");
int NpM = N / M;
BlockingQueue queue = new LinkedBlockingQueue();
long startTime = System.currentTimeMillis();
for (int i = 0; i < M; i++) { new Body(queue, NpM).start(); }
for (int i = 0; i < M; i++) { try { queue.take(); } catch (InterruptedException ie) {} }
long stopTime = System.currentTimeMillis();
System.out.println((NpM * M) / ((stopTime - startTime) / 1000.0));
}
public static class Body extends Thread {
BlockingQueue queue;
int count;
public Body(BlockingQueue queue, int count) {
this.queue = queue;
this.count = count;
}
public void run() {
if (count == 0) {
try { queue.put(this); } catch (InterruptedException ie) {}
} else {
new Body(queue, count - 1).start();
}
}
}
}
September 10th, 2006
tonyg
My previous post examined Erlang’s speed of process setup and teardown. Here I’m looking at how quickly messages can be sent and received within a single Erlang node. Roughly speaking, I’m seeing 3.4 million deliveries per second one-way, and 1.4 million roundtrips per second (2.8 million deliveries per second) in a ping-pong setup in the same environment as previously - a 2.8GHz Pentium 4 with 1MB cache.
Here’s the code I’m using - time_diff and dotimes aren’t shown, because they’re the same as the code in the previous post:
-module(ipctest).
-export([oneway/0, consumer/0, pingpong/0]).
oneway() ->
N = 10000000,
Pid = spawn(ipctest, consumer, []),
Start = erlang:now(),
dotimes(N - 1, fun () -> Pid ! message end),
Pid ! {done, self()},
receive ok -> ok end,
Stop = erlang:now(),
N / time_diff(Start, Stop).
pingpong() ->
N = 10000000,
Pid = spawn(ipctest, consumer, []),
Start = erlang:now(),
Message = {ping, self()},
dotimes(N, fun () ->
Pid ! Message,
receive pong -> ok end
end),
Stop = erlang:now(),
N / time_diff(Start, Stop).
consumer() ->
receive
message -> consumer();
{done, Pid} -> Pid ! ok;
{ping, Pid} ->
Pid ! pong,
consumer()
end.
%% code omitted - see previous post
September 10th, 2006
tonyg
Very fast indeed.
1> spawntest:serial_spawn(1).
3.58599e+5
That’s telling me that Erlang can create and tear down processes at a
rate of roughly 350,000 Hz. The numbers change slightly - things slow
down - if I’m running the test in parallel:
2> spawntest:serial_spawn(10).
3.48489e+5
3> spawntest:serial_spawn(10).
3.40288e+5
4> spawntest:serial_spawn(100).
3.35983e+5
5> spawntest:serial_spawn(100).
3.36743e+5
[Update: I forgot to mention earlier that the system seems to spend 50% CPU in user and 50% in system time. Very odd! I wonder what the Erlang runtime is doing to spend so much system time?]
Here’s the code for what I’m doing:
-module(spawntest).
-export([serial_spawn/1]).
serial_spawn(M) ->
N = 1000000,
NpM = N div M,
Start = erlang:now(),
dotimes(M, fun () -> serial_spawn(self(), NpM) end),
dotimes(M, fun () -> receive X -> X end end),
Stop = erlang:now(),
(NpM * M) / time_diff(Start, Stop).
serial_spawn(Who, 0) -> Who ! done;
serial_spawn(Who, Count) ->
spawn(fun () ->
serial_spawn(Who, Count - 1)
end).
dotimes(0, _) -> done;
dotimes(N, F) ->
F(),
dotimes(N - 1, F).
time_diff({A1,A2,A3}, {B1,B2,B3}) ->
(B1 - A1) * 1000000 + (B2 - A2) + (B3 - A3) / 1000000.0 .
This is all on an Intel Pentium 4 running at 2.8GHz, with 1MB cache, on Debian linux, with erlang_11.b.0-3_all.deb.
September 10th, 2006
tonyg
Next Posts
Previous Posts