technology from back to front

Archive for February, 2009

Streamlining HTTP

HTTP/1.1 is a lovely protocol. Text-based, sophisticated, flexible. It
does tend toward the verbose though. What if we wanted to use HTTP’s
semantics in a very high-speed messaging situation? How could we
mitigate the overhead of all those headers?

Now, bandwidth is pretty cheap: cheap enough that for most
applications the kind of approach I suggest below is ridiculously far
over the top. Some situations, though, really do need a more efficient
protocol: I’m thinking of people having to consume the [OPRA][] feed,
which is fast approaching 1 million messages per second ([1][], [2][],
[3][]). What if, in some bizarre situation, HTTP was the protocol used
to deliver a full OPRA feed?

### Being Stateful

Instead of having each HTTP request start with a clean slate after the
previous request on a given connection has been processed, how about
giving connections a memory?

Let’s invent a syntax for HTTP that is easy to translate back to
regular HTTP syntax, but that [avoids repeating ourselves][DRY] quite
so much.

Each line starts with an opcode and a colon. The rest of the line is
interpreted depending on the opcode. Each opcode-line is terminated
with CRLF.

V:HTTP/1.x Set HTTP version identifier.
B:/some/base/url Set base URL for requests.
M:GET Set method for requests.
<:somename Retrieve a named configuration
>:somename Give the current configuration a name
H:Header: value Set a header
-:/url/suffix Issue a bodyless request
+:/url/suffix 12345 Issue a request with a body

Opcodes `V`, `B`, `M` and `H` are hopefully self-explanatory. I’ll
explore `<` and `>` below. The opcodes `-` and `+` actually complete
each request and tell the server to process the message.

Opcode `-` takes as its argument a URL fragment that gets appended to
the base URL set by opcode `B`. Opcode `+` does the same, but also
takes an ASCII `Content-Length` value, which tells the server to read
that many bytes after the CRLF of the `+` line, and to use the bytes
read as the entity body of the HTTP request.

`Content-Length` is a slightly weird header, more properly associated
with the entity body than the headers proper, which is why it gets
special treatment. (We could also come up with a syntax for indicating
chunked transfer encoding for the entity body.)

As an example, let’s encode the following `POST` request:

POST /someurl HTTP/1.1
Content-Type: text/plain
Accept-Encoding: identity
Content-Length: 13

hello world

Encoded, this becomes

H:Content-Type: text/plain
H:Accept-Encoding: identity
+: 13
hello world

Not an obvious improvement. However, consider issuing 100 copies of
that same request on a single connection. With plain HTTP, all the
headers are repeated; with our encoded HTTP, the only part that is
repeated is:

+: 13
hello world

Instead of sending (151 * 100) = 15100 bytes, we now send 130 + (20 *
100) = 2130 bytes.

The scheme as described so far takes care of the unchanging parts of
repeated HTTP requests; for the changing parts, such as `Accept` and
`Referer` headers, we need to make use of the `<` and `>`
opcodes. Before I get into that, though, let’s take a look at how the
scheme so far might work in the case of OPRA.

### Measuring OPRA

Each OPRA quote update is [on average 66 bytes
long](, making for
around 63MB/s of raw content.

Let’s imagine that each delivery appears as a separate HTTP request:

POST /receiver HTTP/1.1
Content-Type: application/x-opra-quote
Accept-Encoding: identity
Content-Length: 66


That’s 213 bytes long: an overhead of 220% over the raw message

Encoded using the stateful scheme above, the first request appears on
the wire as

H:Content-Type: application/x-opra-quote
H:Accept-Encoding: identity
+: 66

and subsequent requests as

+: 66

for an amortized per-request size of 73 bytes: a much less problematic
overhead of 11%. In summary:

Encoding Bytes per message body Per-message overhead (bytes) Size increase over raw content Bandwidth at 1M msgs/sec
Plain HTTP 66 147 220% 203.1 MBy/s
Encoded HTTP 66 7 11% 69.6 MBy/s

Using plain HTTP, the feed doesn’t fit on a gigabit ethernet. Using
our encoding scheme, it does.

Besides the savings in terms of bandwidth, the encoding scheme could
also help with saving CPU. After processing the headers once, the
results of the processing could be cached, avoiding unnecessary
repetition of potentially expensive calculations such as routing,
authentication, and authorisation.

### Almost-identical requests

Above, I mentioned that some headers changed, while others stayed the
same from request to request. The `<` and `>` opcodes are intended to
deal with just this situation.

The `>` opcode stores the current state in a named register, and the
`<` opcode loads the current state from a register. Headers that don't
change between requests are placed into a register, and each request
loads from that register before setting its request-specific headers.

To illustrate, imagine the following two requests:

GET / HTTP/1.1
Cookie: key=value
Accept: HTTP Accept=text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8

GET /style.css HTTP/1.1
Cookie: key=value
Accept: text/css,*/*;q=0.1

One possible encoding is:

H:Cookie: key=value
H:Accept: HTTP Accept=text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
H:Accept: text/css,*/*;q=0.1

By using `<:config1`, the second request reuses the stored settings
for the method, base URL, HTTP version, and `Host` and `Cookie`

### It’ll never catch on, of course — and I don’t mean for it to

Most applications of HTTP do fine using ordinary HTTP syntax. I’m not
suggesting changing HTTP, or trying to get an encoding scheme like
this deployed in any browser or webserver at all. The point of the
exercise is to consider how low one might make the bandwidth overheads
of a text-based protocol like HTTP for the specific case of a
high-speed messaging scenario.

In situations where the semantics of HTTP make sense, but the syntax
is just too verbose, schemes like this one can be useful on a
point-to-point link. There’s no need for global support for an
alternative syntax, since people who are already forming very specific
contracts with each other for the exchange of information can choose
to use it, or not, on a case-by-case basis.

Instead of specifying a whole new transport protocol for high-speed
links, people can reuse the considerable amount of work that’s gone
into HTTP, without paying the bandwidth price.

### Aside: AMQP 0-8 / 0-9

Just as a throwaway comparison, I computed the minimum possible
overhead for sending a 66-byte message using AMQP 0-8 or 0-9. Using a
single-letter queue name, “`q`”, the overhead is 69 bytes per message,
or 105% of the message body. For our OPRA example at 1M messages per
second, that works out at 128.7 megabytes per second, and we’re back
over the limit of a single gigabit ethernet again. Interestingly,
despite AMQP’s binary nature, its overhead is much higher than a
simple syntactic rearrangement of a text-based protocol in this case.

### Conclusion

We considered the overhead of using plain HTTP in a high-speed
messaging scenario, and invented a simple alternative syntax for HTTP
that drastically reduces the wasted bandwidth.

For the specific example of the OPRA feed, the computed bandwidth
requirement of the experimental syntax is only 11% higher than the raw
data itself — nearly 3 times less than ordinary HTTP.


Note: [this][3] is a local mirror of [this](




Jeff Lindsay on Web Hooks

From Jason Salas‘s interview with Jeff Lindsay, the guy who invented the term web hooks:

“For example, the Facebook Platform, although pretty complicated and full of their own technology, is still at the core based on web hooks. They call out to a user-defined external web application and integrate that with their application. That’s quite a radically different use of web hooks compared to the way people think of them in relation to XMPP.”

That’s an interesting point: while nothing is stopping XMPP from being used this way, it’s not how it is currently used. XMPP seems to be gaining some adoption for asynchronous or messaging-style tasks, but I haven’t seen much in the way of generalised RPC over XMPP yet. (Perhaps I’ve overlooked something obvious?) HTTP, on the other hand, is being used both for asynchronous operations (HTTP push, where the HTTP response has no body, and serves as an acknowledgement of receipt or completion) and for synchronous RPC-like operations (JSON-RPC, SOAP, CGI, ordinary static web pages).

Web hooks can be seen as an approach to making it easier for people to participate in the world of distributed objects that is HTTP — a worthy goal.


Come to LShift!

For some reason we are busier than ever at the moment, and we’re even more keen than usual to recruit. There are recruitment details on our website, but I wanted to put it into my own words, as something I can send to my friends when I’m telling them what we’re about and why you want to come and work for us.

What we do: LShift is primarily a contract coding shop: we rent out our software expertise to who needs it, mostly by the hour but occasionally on a fixed price basis. We’re not centered around any particular specialization; though a lot of jobs we get are primarily web-based applications, we’ve done a variety of other jobs from PC device management software to our message queueing system RabbitMQ. And we don’t fixate on any one language: just now there are projects in Java, Python, C#, a little C, and Erlang; Java’s the biggest as you’d expect, but in the past we’ve used Scheme and an upcoming project may end up in Haskell. We’ve done jobs that lasted only a week, while others have run for years, building up strong relationships with clients who come back to us again and again when they need work done. We are also branching out into in-house projects – RabbitMQ as linked above, and Expro. This variety is reflected in what we look for from you: we don’t require knowledge of any specific programming language or environment, we want to know that you’re flexible enough to pick up the skills that may be needed in a particular situation, and to pick the right tool for the job.

How we do it: unlike many such businesses, we don’t try to hide the technical side from the client side but put it front and centre. Every developer we hire is expected to be able to lead a project: the lead is the first person the client talks to about how the project is going, and guiding the project to completion is in their hands. Because one person is in control of technical, project management, and client-facing roles, these aspects pull together instead of against each other: developers aren’t pulling against management, the client gets accurate information on what can be done and how, and you get to find out the client’s needs from the horse’s mouth. This means that you have to be happy taking on responsibility to work here; for many, it’s a welcome change.

What it’s like working here: frankly I can’t imagine there are many better places to work. Come to interview, and you’ll have a chance to meet and talk to everyone who works here and gain your own impressions of what it’s like, once in the office and then again in the pub! If you’re taken on, you’ll be working in a room full of smart and knowledgeable people who you will want to learn from and who will want to learn from you, in a friendly and productive environment which values getting things done and doing things right. Send your CV and if you’ve got the skills you could soon be doing something a lot more fun and varied.

Paul Crowley

EvServer, part2: Rabbit and Comet

Few days ago I introduced EvServer. In this post I’ll present a simple EvServer example.

EvServer is a normal WSGI server, but with one additional feature. Instead of blocking in your WSGI application you yield a file descriptor to the server. On descriptor activity the server will continue your WSGI app till it yields again.

I’ll show how to wait for AMQP messages inside the WSGI application and how to push them up to the browser. If you can’t wait till the end of the post, please feel free to view the online demo(outdated) of the code described below.

Consuming (from) the Rabbit

To consume AMQP messages we need a python amqp library (version 0.6). Barry Pederson rewrote it lately and I must admit I really like the code now. Great work Barry! Usually, the code that consumes messages (a subscriber) looks like this:

import amqplib.client_0_8 as amqp

def callback(msg):     print msg.body

def main():     conn = amqp.Connection('localhost', userid='guest', password='guest')     ch =     ch.access_request('/data', active=True, read=True)     ch.exchange_declare('myfan', 'fanout', auto_delete=True)     qname, _, _ = ch.queue_declare()     ch.queue_bind(qname, 'myfan')     ch.basic_consume(qname, callback=callback)     while ch.callbacks:         ch.wait() # blocking here!     ch.close()     conn.close()

The problem is that we need to use it in an Asynchronous WSGI application which is non blocking. The conversion is not very difficult. First, we need to identify the socket descriptor on which the wait() blocks, then we need to make it non-blocking. This solution is quite hackish, but it’s all we can do. Maybe this hack should be put in the Hall of Fame of Dirty Hacks…

    sd = conn.transport.sock

At this point, a bad thing happens when ch.wait() tries to block on a non-blocking descriptor. We’d expect the socket to raise socket.error: (11, ‘Resource temporarily unavailable’). Actually, it fails with quite a strange exception, but fortunately it doesn’t break anything inside py-amqplib:

Traceback (most recent call last):
  File "/home/majek/amqplib-0.6/amqplib/client_0_8/", line 201, in _wait_method
    self.method_reader.read_method() TypeError: 'NoneType' object is not iterable

We use this exception to identify when the library wants to block. Having this knowledge, the main loop of a non-blocking consumer is now easy to write:

    while True:
            while True: # until the exception
        except TypeError:

        <block till activity on conn.transport.sock>

At this point modifying the code to became a valid AWSGI application is straightforward. Full code can be found in the EvServer examples directory. Here’s the simplified version:

def wsgi_subscribe(environ, start_response):
    start_response("200 OK", [('Content-type','text/plain')])

    msgs = []
    def callback(msg):

    <setup connection, channel, queue>

        while ch.callbacks:
                while True: # until exception
            except (TypeError,), e:

            yield 'got messages: %r\n' % (msgs,)
            while msgs: msgs.pop() # empty the queue

            # block!
            yield environ['x-wsgiorg.fdevent.readable'](conn.transport.sock)
    except GeneratorExit:

Introducing the Comet library

Michael Carter has figured out how to make Comet – long lasting HTTP push connection – work on all major browsers. As a part of EvServer I distribute a simple javascript Comet library based on his work. The basic API consists of one method that creates a comet channel: comet_connection(url, callback).

<script src="./static/comet.js" type="text/javascript"></script>

function user_callback(data){
    alert('got message: ' + data);
close_comet_function = comet_connection(url, user_callback);

There are many different encapsulation types for Comet messages and the format of the messages emitted from the server depends sensitively on the browser type. Fortunately the WSGI application can remain quite simple by using the evserver.transports wrapper:

import evserver.transports as transports

def simplest_comet_application(environ, start_response):     t = transports.get_transport(<comet transport family>)     start_response('200 OK', t.get_headers())     yield t.start()     yield t.write('fist message!')     yield t.write('second message!')

Please, don’t build a chat

I think we’re all sick of yet-another-chat examples. I propose to build something similar, but instead of broadcasting chat messages let’s broadcast user-agent and referrer http fields of people who view the site.

The html site is going to be dead simple.

On the server side we will serve comet.js file and the main HTML file. While serving the HTML, we’ll send an AMQP message. We also need to create a Comet channel URL. Let’s use Django – in total we need three Django views, one of which can be generic.

We need: python2.5, rabbitmq, evserver, py-amqplib-0.6, django and my django project code.

sudo apt-get install erlang-nox setuptools subversion
sudo dpkg -i rabbitmq-server_1.5.1-1_all.deb

sudo easy_install evserver-*.egg

tar xvzf amqplib-0.6.tgz
cd amqplib-0.6
sudo python install
cd ..

tar xzvf Django-1.0.2-final.tar.gz
cd Django-1.0.2-final
sudo python install

cd ..

svn checkout django_agentpush

After these steps, to run the code just type:

cd django_agentpush
./ runevserver
PYTHONPATH=".." DJANGO_SETTINGS_MODULE=django_agentpush.settings evserver --framework=django

Currently you should be able to view online the working code(outdated) (source code is here).

I’ve shown how to build a comet application that waits for messages from RabbitMQ. Using these tools you should be able to build ambitious real-time collaboration web applications, like EtherPad. I’m starting to believe that the hardest part in real-time web apps is the javascript.

LShift is recruiting! (thanks to all of you that already have contacted us, we’re still reviewing your CVs)


Erlang/OTP’s global module

Erlang/OTP’s global module helps with atomic assignment of names for processes in a distributed Erlang cluster. It makes sure that only a single process at a time holds any given name, across all connected nodes. Unlike the local name registration function, names aren’t limited to being atoms: with global, they can be any term at all.

To see global‘s conflict-resolution in action, we need to register a name on two nodes not initially connected, and then make them aware of each other. The system will pick one registration to survive, and will terminate the other registration.

First, register the name “a” on each of two nodes (started with erl -sname one and erl -sname two, respectively). On node one:

Eshell V5.6.2  (abort with ^G)
(one@walk)1> global:register_name(a, self()).
(one@walk)2> global:whereis_name(a).

We see that the name was registered successfully (the call to register_name returned yes), and that when looked up, a pid (the pid of the shell process) is returned, as we would expect. Now, the same on node two:

Eshell V5.6.2  (abort with ^G)
(two@walk)1> global:register_name(a, self()).
(two@walk)2> global:whereis_name(a).

Again, we see it succeeding. Note that each node has successfully registered the “global” name “a”. This is because they are unaware of each other. Once they’re connected, Erlang/OTP will automatically resolve the situation. By default, it does this by terminating one of the two contending processes.

Let’s see what happens. Connect the two nodes together, by pinging one from the other — here, pinging node two from node one:

(one@walk)3> net_adm:ping(two@walk).
=INFO REPORT==== 13-Feb-2009::03:05:22 ===
global: Name conflict terminating {a,<5744.37.0>}

(one@walk)4> global:whereis_name(a).

See that the termination of one of the contenders is reported with a message in the system log. It was the registration on node two that was terminated, and the registration on node one that survived. Here’s what we see on node two:

** exception error: killed
(two@walk)3> global:whereis_name(a).
(two@walk)4> node(global:whereis_name(a)).   

Node two’s registered process has been killed. When we then ask about the registration for the name “a”, we see a pid from node one.

Finally, we’ll try registering the name for a second time:

(two@walk)5> global:register_name(a, self()).

It answers no because there’s already a registration that it knows about in the system. The same no answer would have been returned if we’d tried the same thing on node one instead.


Fast ACID persistence manager for Jackrabbit

I tend to think of Oracles BDB java edition as the solution to all storage performance problems: The least performance sacrifice you can possibly make to get ACID storage in java. So when as our automated configuration import for Magnolia got slower and slower, it was naturally what I turned to.

Its particularly suited in this case, because Jackrabbit has a very simple contract for a persistence manager to implement, and handles indexing (and hence searching) separately, so there’s no need to handle query optimization.

Unfortunately, it didn’t immediately prove to be reliable, and I had to shelve it until I had some more time to work on it. In fact, the problems turn out to be simple – they just benefited from fresh eyes.

I haven’t wanted to delve much into Jackrabbit, so I have ignored how it uses the persistence manager. does all its work in a transaction. This is probably enough to make Jackrabbit ACID, however either Magnolia or Jackrabbit have this wrong – you can certainly leave your Magnolia instance in an inconsistent state if you interrupt the application.

I’ve mostly noted this as various strategies to get the BDB environment shut down properly have failed. You want to share an environment between multiple workspaces, because Magnolia sets up lots of them, and BDB’s cache would be very inefficient if it was split up into lots of little segments. Its not easy to use finalize to clean up the environment – I use the collections interface which I have to pass the environment to directly, and of course BDB runs several threads, which mean an environment won’t be garbage collected until its closed. In the end, I’ve settled on tracking the set of workspaces using the environment. I also have a fall-back – a ServletContextListener that will shut down any open environments when the servlet context shuts down.

I still need to write some decent documentation for this, but there is the mercurial repository, if you want to try it out.Its a maven project.

It is fast, but I haven’t measured the performance. Because of the above mentioned problems with Magnolia and transactions, you may not want to use it for your production magnolia author instance, but its great for developers and public instances. A remotely accessible service would be at least as reliable as using an external DBMS. Its already no worse than using an embedded DB or other embedded persistence manager.


EvServer, Introduction: The tale of a forgotten feature

Long long time ago there was a WSGI spec. This document described a lot of interesting stuff. Between other very important paragraphs you could find a hidden gem:

[...] applications will usually return an iterator (often
a generator-iterator) that produces the output in a block-by-block
fashion. These blocks may be broken to coincide with mulitpart
boundaries (for “server push”), or just before time-consuming
tasks (such as reading another block of an on-disk file). [...]

It means that all WSGI conforming servers should be able to send multipart http responses. WSGI clock application theoretically could be written like that:

def clock_demo(environ, start_response):
    start_response("200 OK", [('Content-type','text/plain')])
    for i in range(100):
        yield "%s\n" % (,)

The problem is that way of programming just doesn’t work well. It’s not scalable, requires a lot of threads and can eat a lot of resources. That’s why the feature has been forgotten.

Until May 2008, when Christopher Stawarz reminded us this feature and proposed an enhancement to it. He suggested, that instead of blocking, like time.sleep(1), inside the code WSGI application should return a file descriptor to server. When an event happens on this descriptor, the WSGI app will be continued. Here’s equivalent of the previous code, but using the extension. With appropriate server this could be scalable and work as expected:

def clock_demo(environ, start_response):
    start_response("200 OK", [('Content-type','text/plain')])
    sd = socket.socket(socket.AF_INET, socket.SOCK_DGRAM)
        for i in range(100):
            yield environ['x-wsgiorg.fdevent.readable'](sd, 1.0)
            yield "%s\n" % (,)
    except GeneratorExit:

So I created a server that supports it: EvServer the Asynchronous Python WSGI Server


I did my best to implement the latest of the three versions of Chris proposal. The code is based on my hacked together implementation of a very similar project django-evserver, which was created way before the extension was invented and before I knew about the WSGI multipart feature.

EvServer is very small and lightweight , the core is about 1000 lines of Python code. Apparently, due to the fact that EvServer is using ctypes bindings to libevent, it’s quite fast.

I did a basic test to see how fast it is. The methodology is very dumb, I just measure the number of handled WSGI requests per second, so as a result I receive only the server speed. The difference is clearly visible:

evserver 4254
spawning with threads 1237
spawning without threads 2200
cherrypy wsgi server 1700





So what really EvServer is?

  • It’s yet another WSGI server.
  • It’s very low levelish, the WSGI application has control on almost every http header.
  • It’s great for building COMET applications.
  • It’s fast and lightweight.
  • It’s feature complete.
  • Internally it’s asynchronous.
  • It’s simple to use.
  • It’s 100% written in Python, though it uses libevent library, which is in C.

What EvServer is not?

  • Unfortunately, it’s not mature yet.
  • It’s Linux and Mac only.
  • It’s not fully blown, Apache-like web server.
  • Currently it’s Python 2.5 only.


Admittedly using raw WSGI for regular web applications is a bit inconvenient. Fortunately decent web frameworks support passing iterators from the web application down to the WSGI server, throughout all the framework. On my list of frameworks that support iterators you can find: Django and


Django 1.0 supports returning iterators from views. This is Django code for the clock example:

def django_clock(request):
    def iterator():
        sd = socket.socket(socket.AF_INET, socket.SOCK_DGRAM)
            while True:
                yield request.environ['x-wsgiorg.fdevent.readable'](sd, 1.0)
                yield '%s\n' % (,)
        except GeneratorExit:
    return HttpResponse(iterator(), mimetype="text/plain")

The problem is that this code is not going to work using the standard ./manage runserver development server. Fortunately, it’s very easy to integrate EvServer with Django, you only need to put that into

    'evserver',             # <<< THIS LINE enables runevserver command)

Now you can test your app using ./manage runevserver.

Full source code for the example django application is in the EvServer examples directory.

From the 0.3 version supports returning iterators. You can see it in action here:

class webpy_clock:
    def GET(self, name):
        web.header('Content-Type','text/plain', unique=True)
        environ = web.ctx.environ
        def iterable():
            sd = socket.socket(socket.AF_INET, socket.SOCK_DGRAM) # any udp socket
                while True:
                    yield environ['x-wsgiorg.fdevent.readable'](sd, 1.0)
                    yield "%s\n" % (,)
            except GeneratorExit:
        return iterable()

The full source code is included in EvServer example directory . You can run this code using command:

evserver --exec "import examples.framework_webpy; application = examples.framework_webpy.application"


I haven’t discussed any useful scenario yet, I’ll try to do that in the future post. I’m thinking of some interesting uses for EvServer – pushing the data to the browser using COMET.

LShift is recruiting!




You are currently browsing the LShift Ltd. blog archives for February, 2009.



2000-14 LShift Ltd, 1st Floor, Hoxton Point, 6 Rufus Street, London, N1 6PE, UK+44 (0)20 7729 7060   Contact us