Posts filed under 'Erlang'

How should JSON strings be represented in Erlang?

Erlang represents strings as lists of (ASCII, or possibly iso8859-1) codepoints. In this regard, it’s weakly typed - there’s no hard distinction between a string, “ABC”, and a list of small integers, [65,66,67]. For example:

Eshell V5.5.4  (abort with ^G)
1> "ABC".
"ABC"
2> [65,66,67].
"ABC"
3> 

Erlang also has a binary type, a simple vector of bytes. In the rfc4627/JSON codec I made for Erlang, I chose to use binaries to represent decoded strings, as suggested by Joe Armstrong.

All was well - until I came to implement UTF8 support after Sam Ruby got the ball rolling. Binaries will no longer work as the chosen mapping for JSON strings, since strings may contain arbitrary characters, including those with codepoints greater than 255.

It has always been the case that the ideal representation for a JSON string is an Erlang string, a list of codepoints. Binaries are really a bit of a compromise. But choosing strings-for-strings puts us straight back in a weakly-typed position: it’s possible in JSON to distinguish between “ABC” and [65,66,67], but it’s not possible to make the same distinction in Erlang. We’d need to alter the way JSON arrays are represented to compensate.

Possible solutions:

  • Map strings to lists of codepoints. Map arrays to tuples rather than lists. Objects remain {obj,[…]}.
    • Pros: Terse syntax for strings and arrays, no worse than the Unicode-ignorant mapping
    • Cons: Awkward recursion over arrays, either using a counter and the element/2 BIF, or converting to a real list

  • Map strings to binaries containing UTF-8 encoded characters. Keep arrays as lists. Objects remain {obj,[…]}.

    • Pros: Keep terse syntax for strings, with the understanding that the binaries concerned must hold UTF8-encoded text. Keeps the interface largely unchanged.
    • Cons: Codec needs to perform possibly-redundant Unicode encoding/decoding steps to ensure that the binaries hold UTF8 even if, say, UTF32 were the format to be used on the wire

  • Map strings to lists of codepoints. Map arrays to {arr,[…]}, as other JSON codecs do. Objects remain {obj,[…]}.

    • Pros: Natural operations on strings, natural operations on arrays (once you strip the outer {arr,…}).
    • Cons: Converting terms to JSON-encodable form is a pain, since you need to wrap each array in your term with the explicit marker atom.

All in all, I can’t decide which is the least distasteful option. I think I prefer the middle option, keeping strings mapped to binaries and viewing them as UTF-8 encoded text, but I really need to get some feedback on the issue.

8 comments September 13th, 2007 tonyg

Invitation to AMQP and RabbitMQ Birds of a Feather session

I am guest blogging here on behalf of CohesiveFT. We work with the excellent LShift team on our joint venture, RabbitMQ.

I’m here to invite you to a Birds of a Feather session this coming Thursday, August 30th, at 8pm, in central London. It is FREE and will last for 45 minutes starting at 8pm, followed by the traditional breakout discussions over a beer. Please do take a look at RabbitMQ if you have not yet done so. It’s a commercial open source product, available under the MPL 1.1 and implementing the Advanced Message Queue Protocol. AMQP is a new way to do business messaging (ie: “what goes in, must come out“). What’s really cool is that like HTTP it is a protocol instead of a language specific API. This should make interoperability between platforms much easier and less painful (business readers: “systems integration projects take less time and success can be predicted more accurately”). For more information, please see my list of links here.

What is the BOF about - and why come? It’s an informal session about RabbitMQ and AMQP, and how they apply within popular environments such as Spring, Mule, Ruby, AJAX, and other messaging protocols such as FIX.

“Informal” means we’ll be encouraging a conversation between people interested in any of these things. We want to hear from you, and from each other, rather than pushing slideware at people.

Come if you want to:

You can find out details of the BOF here. Ideally we ask you to register via the web site, but late arrivals are very welcome - if you turn up, we shall get you in. The BOF is offered as part of the popular EJUG series of tech talks and as a tie-in with the most excellent No Fluff Just Stuff conference.

If you cannot come but want to know more about any of these things then you can email us at info@rabbitmq.com.

Thank-you very much - and we hope to see you on Thursday :-)

Posted by Chris on behalf of Alexis Richardson, CohesiveFT.

2 comments August 28th, 2007 chris

Updated AJAX Erlang Jukebox

Erlang Jukebox ScreenshotOur jukebox (mentioned previously) received an update yesterday.

To download the code,

There is a little bit of documentation available, and you can browse the code.

2 comments June 21st, 2007 tonyg

JSON and JSON-RPC for Erlang

About a month ago, I wrote an implementation of RFC 4627, the JSON RFC, for Erlang. I also implemented JSON-RPC over HTTP, in the form of mod_jsonrpc, a plugin for Erlang’s built-in inets httpd. This makes accessing Erlang services from in-browser Javascript very comfortable and easy indeed.

Continue Reading 7 comments February 17th, 2007 tonyg

RFC 1982 limits itself to powers of two unnecessarily

RFC 1982 defines a “Serial Number Arithmetic”, for use when you have a fixed number of bits available for some monotonically increasing sequence identifier, such as the DNS SOA record serial number, or message IDs in some messaging protocol. It defines all its operations with respect to some power of two, (2^SERIAL_BITS). It struck me just now that there’s no reason why you couldn’t generalise to any number that simply has two as a factor. You’d simply replace any mention of (2^SERIAL_BITS) by, say, N, and any mention of (2^(SERIAL_BITS-1)) by (N/2). The definitions for addition and comparison still seem to hold just as well.

One of the reasons I was thinking along these lines is that in Erlang, it’s occasionally useful to model a queue in an ETS table or in a process dictionary. If one didn’t mind setting an upper bound on the length of one’s modelled queue, then by judicious use of RFC 1982-style sequence number wrapping, one might ensure that the space devoted to the sequence numbering required of the model remained bounded. Using a generalised variant of RFC 1982 arithmetic, one becomes free to choose any number as the queue length bound, rather than any power of two.

2 comments February 17th, 2007 tonyg

Rabbits, rabbits, rabbits

We’re proud to announce that the project we’ve been working on for the past few months, RabbitMQ, has been released. RabbitMQ is an AMQP server written using Erlang/OTP. Check it out at http://www.rabbitmq.com/ - or you can go straight to the downloads page for sources and binaries.

1 comment February 1st, 2007 tonyg

An AJAX Erlang Jukebox

Erlang Jukebox Screenshot

Sometime around the beginning of July I rewrote our internal jukebox in Erlang. It’s taken me four months to get a round tuit, but new stock has just arrived: here’s the code for our AJAX jukebox web-application, as a tarball. (There’s also a darcs repository: darcs get http://www.lshift.net/~tonyg/erlang-jukebox/.) Click on the image for a screenshot.

To run it, you will need Erlang, Yaws (the Erlang webserver), a modern browser, mpg123, ogg123 (from vorbis-tools), and some MP3 or OGG files to listen to.

I’ve made a start on a bit of documentation and design rationale. Here are a few highlights for the curious:

  • You point the jukebox at one or more root URLs, which it then spiders, collecting URLs for MP3 and OGG files, which it puts into a simple flat-file database. Just expose, say, your iTunes folder via Apache, point the Jukebox at it, and you’re away.

  • It relies on mpg123 and ogg123’s support for playing HTTP-streamed MP3 and OGG files, respectively, rather than retrieving or playing the media itself.

  • The user interface is completely written in HTML+Javascript, using prototype for its event binding and XMLHttpRequest support.

  • The server side of the application communicates with the user interface solely via JSON-RPC.

  • Erlang made a great platform for the server side of the application. Its support for clean, simple concurrency let me design the program in a very natural way.

As part of the development of the program, I built a few stand-alone modules that others might be interested in reusing:

[Update: fixed an issue with json.js, tweaked the use of screen real-estate, and now seems to work with Safari, IE6, and Opera. I’ve changed the tarball link above to point to the new version.]

[Update: fixed a couple of links that had broken over time as the darcs repository evolved.]

17 comments November 6th, 2006 tonyg

Erlang processes vs. Java threads

Earlier today I ran a simple test of Erlang’s process creation and teardown code, resulting in a rough figure of 350,000 process creations and teardowns per second. Attempting a similar workload in Java gives a figure of around 11,000 thread creations and teardowns per second - to my mind, a clear demonstration of one of the main advantages of Erlang’s extremely lightweight processes.

Here’s the Java code I used - see the earlier post for the Erlang code, to compare:

    // Java 5 - uses a BlockingQueue.
    import java.util.concurrent.*;

    public class SpawnTest extends Thread {
        public static void main(String[] args) {
            int M = Integer.parseInt(args.length > 0 ? args[0] : "1");
            int N = Integer.parseInt(args.length > 1 ? args[1] : "1000000");
            int NpM = N / M;
            BlockingQueue queue = new LinkedBlockingQueue();
            long startTime = System.currentTimeMillis();
            for (int i = 0; i < M; i++) { new Body(queue, NpM).start(); }
            for (int i = 0; i < M; i++) { try { queue.take(); } catch (InterruptedException ie) {} }
            long stopTime = System.currentTimeMillis();
            System.out.println((NpM * M) / ((stopTime - startTime) / 1000.0));
        }

        public static class Body extends Thread {
            BlockingQueue queue;
            int count;
            public Body(BlockingQueue queue, int count) {
                this.queue = queue;
                this.count = count;
            }
            public void run() {
                if (count == 0) {
                    try { queue.put(this); } catch (InterruptedException ie) {}
                } else {
                    new Body(queue, count - 1).start();
                }
            }
        }
    }

21 comments September 10th, 2006 tonyg

How fast can Erlang send messages?

My previous post examined Erlang’s speed of process setup and teardown. Here I’m looking at how quickly messages can be sent and received within a single Erlang node. Roughly speaking, I’m seeing 3.4 million deliveries per second one-way, and 1.4 million roundtrips per second (2.8 million deliveries per second) in a ping-pong setup in the same environment as previously - a 2.8GHz Pentium 4 with 1MB cache.

Here’s the code I’m using - time_diff and dotimes aren’t shown, because they’re the same as the code in the previous post:

-module(ipctest).
-export([oneway/0, consumer/0, pingpong/0]).

oneway() ->
    N = 10000000,
    Pid = spawn(ipctest, consumer, []),
    Start = erlang:now(),
    dotimes(N - 1, fun () -> Pid ! message end),
    Pid ! {done, self()},
    receive ok -> ok end,
    Stop = erlang:now(),
    N / time_diff(Start, Stop).

pingpong() ->
    N = 10000000,
    Pid = spawn(ipctest, consumer, []),
    Start = erlang:now(),
    Message = {ping, self()},
    dotimes(N, fun () ->
               Pid ! Message,
               receive pong -> ok end
           end),
    Stop = erlang:now(),
    N / time_diff(Start, Stop).

consumer() ->
    receive
    message -> consumer();
    {done, Pid} -> Pid ! ok;
    {ping, Pid} ->
        Pid ! pong,
        consumer()
    end.

%% code omitted - see previous post

2 comments September 10th, 2006 tonyg

How fast can Erlang create processes?

Very fast indeed.

1> spawntest:serial_spawn(1). 
3.58599e+5

That’s telling me that Erlang can create and tear down processes at a rate of roughly 350,000 Hz. The numbers change slightly - things slow down - if I’m running the test in parallel:

2> spawntest:serial_spawn(10).
3.48489e+5
3> spawntest:serial_spawn(10).
3.40288e+5

4> spawntest:serial_spawn(100).
3.35983e+5
5> spawntest:serial_spawn(100).
3.36743e+5

[Update: I forgot to mention earlier that the system seems to spend 50% CPU in user and 50% in system time. Very odd! I wonder what the Erlang runtime is doing to spend so much system time?]

Here’s the code for what I’m doing:

-module(spawntest).
-export([serial_spawn/1]).

serial_spawn(M) ->
    N = 1000000,
    NpM = N div M,
    Start = erlang:now(),
    dotimes(M, fun () -> serial_spawn(self(), NpM) end),
    dotimes(M, fun () -> receive X -> X end end),
    Stop = erlang:now(),
    (NpM * M) / time_diff(Start, Stop).

serial_spawn(Who, 0) -> Who ! done;
serial_spawn(Who, Count) ->
    spawn(fun () ->
          serial_spawn(Who, Count - 1)
      end).

dotimes(0, _) -> done;
dotimes(N, F) ->
    F(),
    dotimes(N - 1, F).

time_diff({A1,A2,A3}, {B1,B2,B3}) ->
    (B1 - A1) * 1000000 + (B2 - A2) + (B3 - A3) / 1000000.0 .

This is all on an Intel Pentium 4 running at 2.8GHz, with 1MB cache, on Debian linux, with erlang_11.b.0-3_all.deb.

7 comments September 10th, 2006 tonyg

Next Posts Previous Posts

Calendar

August 2008
M T W T F S S
« Jul    
 123
45678910
11121314151617
18192021222324
25262728293031

Posts by Month

Posts by Category