HTTP/1.1 is a lovely protocol. Text-based, sophisticated, flexible. It does tend toward the verbose though. What if we wanted to use HTTP’s semantics in a very high-speed messaging situation? How could we mitigate the overhead of all those headers?
Now, bandwidth is pretty cheap: cheap enough that for most applications the kind of approach I suggest below is ridiculously far over the top. Some situations, though, really do need a more efficient protocol: I’m thinking of people having to consume the OPRA feed, which is fast approaching 1 million messages per second (1, 2, 3). What if, in some bizarre situation, HTTP was the protocol used to deliver a full OPRA feed?
Instead of having each HTTP request start with a clean slate after the previous request on a given connection has been processed, how about giving connections a memory?
Let’s invent a syntax for HTTP that is easy to translate back to regular HTTP syntax, but that avoids repeating ourselves quite so much.
Each line starts with an opcode and a colon. The rest of the line is interpreted depending on the opcode. Each opcode-line is terminated with CRLF.
V:HTTP/1.x Set HTTP version identifier.
B:/some/base/url Set base URL for requests.
M:GET Set method for requests.
<:somename Retrieve a named configuration
>:somename Give the current configuration a name
H:Header: value Set a header
-:/url/suffix Issue a bodyless request
+:/url/suffix 12345 Issue a request with a body
Opcodes V, B, M and H are hopefully self-explanatory. I’ll
explore < and > below. The opcodes - and + actually complete
each request and tell the server to process the message.
Opcode - takes as its argument a URL fragment that gets appended to
the base URL set by opcode B. Opcode + does the same, but also
takes an ASCII Content-Length value, which tells the server to read
that many bytes after the CRLF of the + line, and to use the bytes
read as the entity body of the HTTP request.
Content-Length is a slightly weird header, more properly associated
with the entity body than the headers proper, which is why it gets
special treatment. (We could also come up with a syntax for indicating
chunked transfer encoding for the entity body.)
As an example, let’s encode the following POST request:
POST /someurl HTTP/1.1
Host: relay.localhost.lshift.net:8000
Content-Type: text/plain
Accept-Encoding: identity
Content-Length: 13
hello world
Encoded, this becomes
V:HTTP/1.1
B:/someurl
M:POST
H:Host: relay.localhost.lshift.net:8000
H:Content-Type: text/plain
H:Accept-Encoding: identity
+: 13
hello world
Not an obvious improvement. However, consider issuing 100 copies of that same request on a single connection. With plain HTTP, all the headers are repeated; with our encoded HTTP, the only part that is repeated is:
+: 13
hello world
Instead of sending (151 * 100) = 15100 bytes, we now send 130 + (20 * 100) = 2130 bytes.
The scheme as described so far takes care of the unchanging parts of
repeated HTTP requests; for the changing parts, such as Accept and
Referer headers, we need to make use of the < and >
opcodes. Before I get into that, though, let’s take a look at how the
scheme so far might work in the case of OPRA.
Each OPRA quote update is on average 66 bytes long, making for around 63MB/s of raw content.
Let’s imagine that each delivery appears as a separate HTTP request:
POST /receiver HTTP/1.1
Host: opra-receiver.example.com
Content-Type: application/x-opra-quote
Accept-Encoding: identity
Content-Length: 66
blablablablablablablablablablablablablablablablablablablablablabla
That’s 213 bytes long: an overhead of 220% over the raw message content.
Encoded using the stateful scheme above, the first request appears on the wire as
V:HTTP/1.1
B:/receiver
M:POST
H:Host: opra-receiver.example.com
H:Content-Type: application/x-opra-quote
H:Accept-Encoding: identity
+: 66
blablablablablablablablablablablablablablablablablablablablablabla
and subsequent requests as
+: 66
blablablablablablablablablablablablablablablablablablablablablabla
for an amortized per-request size of 73 bytes: a much less problematic overhead of 11%. In summary:
| Encoding | Bytes per message body | Per-message overhead (bytes) | Size increase over raw content | Bandwidth at 1M msgs/sec |
|---|---|---|---|---|
| Plain HTTP | 66 | 147 | 220% | 203.1 MBy/s |
| Encoded HTTP | 66 | 7 | 11% | 69.6 MBy/s |
Using plain HTTP, the feed doesn’t fit on a gigabit ethernet. Using our encoding scheme, it does.
Besides the savings in terms of bandwidth, the encoding scheme could also help with saving CPU. After processing the headers once, the results of the processing could be cached, avoiding unnecessary repetition of potentially expensive calculations such as routing, authentication, and authorisation.
Above, I mentioned that some headers changed, while others stayed the
same from request to request. The < and > opcodes are intended to
deal with just this situation.
The > opcode stores the current state in a named register, and the
< opcode loads the current state from a register. Headers that don’t
change between requests are placed into a register, and each request
loads from that register before setting its request-specific headers.
To illustrate, imagine the following two requests:
GET / HTTP/1.1
Host: www.example.com
Cookie: key=value
Accept: HTTP Accept=text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
GET /style.css HTTP/1.1
Host: www.example.com
Cookie: key=value
Referer: http://www.example.com/
Accept: text/css,*/*;q=0.1
One possible encoding is:
V:HTTP/1.1
B:/
M:GET
H:Host: www.example.com
H:Cookie: key=value
>:config1
H:Accept: HTTP Accept=text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
-:
<:config1
H:Referer: http://www.example.com/
H:Accept: text/css,*/*;q=0.1
-:style.css
By using <:config1, the second request reuses the stored settings
for the method, base URL, HTTP version, and Host and Cookie
headers.
Most applications of HTTP do fine using ordinary HTTP syntax. I’m not suggesting changing HTTP, or trying to get an encoding scheme like this deployed in any browser or webserver at all. The point of the exercise is to consider how low one might make the bandwidth overheads of a text-based protocol like HTTP for the specific case of a high-speed messaging scenario.
In situations where the semantics of HTTP make sense, but the syntax is just too verbose, schemes like this one can be useful on a point-to-point link. There’s no need for global support for an alternative syntax, since people who are already forming very specific contracts with each other for the exchange of information can choose to use it, or not, on a case-by-case basis.
Instead of specifying a whole new transport protocol for high-speed links, people can reuse the considerable amount of work that’s gone into HTTP, without paying the bandwidth price.
Just as a throwaway comparison, I computed the minimum possible
overhead for sending a 66-byte message using AMQP 0-8 or 0-9. Using a
single-letter queue name, “q“, the overhead is 69 bytes per message,
or 105% of the message body. For our OPRA example at 1M messages per
second, that works out at 128.7 megabytes per second, and we’re back
over the limit of a single gigabit ethernet again. Interestingly,
despite AMQP’s binary nature, its overhead is much higher than a
simple syntactic rearrangement of a text-based protocol in this case.
We considered the overhead of using plain HTTP in a high-speed messaging scenario, and invented a simple alternative syntax for HTTP that drastically reduces the wasted bandwidth.
For the specific example of the OPRA feed, the computed bandwidth requirement of the experimental syntax is only 11% higher than the raw data itself — nearly 3 times less than ordinary HTTP.
From Jason Salas’s interview with Jeff Lindsay, the guy who invented the term web hooks:
“For example, the Facebook Platform, although pretty complicated and full of their own technology, is still at the core based on web hooks. They call out to a user-defined external web application and integrate that with their application. That’s quite a radically different use of web hooks compared to the way people think of them in relation to XMPP.”
That’s an interesting point: while nothing is stopping XMPP from being used this way, it’s not how it is currently used. XMPP seems to be gaining some adoption for asynchronous or messaging-style tasks, but I haven’t seen much in the way of generalised RPC over XMPP yet. (Perhaps I’ve overlooked something obvious?) HTTP, on the other hand, is being used both for asynchronous operations (HTTP push, where the HTTP response has no body, and serves as an acknowledgement of receipt or completion) and for synchronous RPC-like operations (JSON-RPC, SOAP, CGI, ordinary static web pages).
Web hooks can be seen as an approach to making it easier for people to participate in the world of distributed objects that is HTTP — a worthy goal.
For some reason we are busier than ever at the moment, and we’re even more keen than usual to recruit. There are recruitment details on our website, but I wanted to put it into my own words, as something I can send to my friends when I’m telling them what we’re about and why you want to come and work for us.
What we do: LShift is primarily a contract coding shop: we rent out our software expertise to who needs it, mostly by the hour but occasionally on a fixed price basis. We’re not centered around any particular specialization; though a lot of jobs we get are primarily web-based applications, we’ve done a variety of other jobs from PC device management software to our message queueing system RabbitMQ. And we don’t fixate on any one language: just now there are projects in Java, Python, C#, a little C, and Erlang; Java’s the biggest as you’d expect, but in the past we’ve used Scheme and an upcoming project may end up in Haskell. We’ve done jobs that lasted only a week, while others have run for years, building up strong relationships with clients who come back to us again and again when they need work done. We are also branching out into in-house projects - RabbitMQ as linked above, and Expro. This variety is reflected in what we look for from you: we don’t require knowledge of any specific programming language or environment, we want to know that you’re flexible enough to pick up the skills that may be needed in a particular situation, and to pick the right tool for the job.
How we do it: unlike many such businesses, we don’t try to hide the technical side from the client side but put it front and center. Every developer we hire is expected to be able to lead a project: the lead is the first person the client talks to about how the project is going, and guiding the project to completion is in their hands. Because one person is in control of technical, project management, and client-facing roles, these aspects pull together instead of against each other: developers aren’t pulling against management, the client gets accurate information on what can be done and how, and you get to find out the client’s needs from the horse’s mouth. This means that you have to be happy taking on responsibility to work here; for many, it’s a welcome change.
What it’s like working here: frankly I can’t imagine there are many better places to work. Come to interview, and you’ll have a chance to meet and talk to everyone who works here and gain your own impressions of what it’s like, once in the office and then again in the pub! If you’re taken on, you’ll be working in a room full of smart and knowledgeable people who you will want to learn from and who will want to learn from you, in a friendly and productive environment which values getting things done and doing things right. Send your CV to impress@lshift.net, and if you’ve got the skills you could soon be doing something a lot more fun and varied.
Few days ago I introduced EvServer. In this post I’ll present a simple EvServer example.
EvServer is a normal WSGI server, but with one additional feature. Instead of blocking in your WSGI application you yield a file descriptor to the server. On descriptor activity the server will continue your
WSGI app till it yields again.
I’ll show how to wait for AMQP messages inside the WSGI application and how to push them up to the browser. If you can’t wait till the end of the post, please feel free to view the online demo(outdated) of the code described below.
Consuming (from) the Rabbit
To consume AMQP messages we need a python amqp library (version 0.6). Barry Pederson rewrote it lately and I must admit I really like the code now. Great work Barry! Usually, the code that consumes messages (a subscriber) looks like this:
import amqplib.client_0_8 as amqp
def callback(msg): print msg.body
def main(): conn = amqp.Connection(‘localhost’, userid=‘guest’, password=‘guest’) ch = conn.channel() ch.access_request(‘/data’, active=True, read=True) ch.exchange_declare(‘myfan’, ‘fanout’, auto_delete=True) qname, _, _ = ch.queue_declare() ch.queue_bind(qname, ‘myfan’) ch.basic_consume(qname, callback=callback) while ch.callbacks: ch.wait() # blocking here! ch.close() conn.close()
The problem is that we need to use it in an Asynchronous WSGI application which is non blocking. The conversion is not very difficult. First, we need to identify the socket descriptor on which the wait() blocks, then we need to make it non-blocking. This solution is quite hackish, but it’s all we can do. Maybe this hack should be put in the Hall of Fame of Dirty Hacks…
sd = conn.transport.sock
sd.setblocking(False)
At this point, a bad thing happens when ch.wait() tries to block on a non-blocking descriptor. We’d expect the socket to raise socket.error: (11, ‘Resource temporarily unavailable’). Actually, it fails with quite a strange exception, but fortunately it doesn’t break anything inside py-amqplib:
Traceback (most recent call last): [...] File “/home/majek/amqplib-0.6/amqplib/client_0_8/connection.py”, line 201, in _wait_methodWe use this exception to identify when the library wants to block. Having this knowledge, the main loop of a non-blocking consumer is now easy to write:
self.method_reader.read_method() TypeError: ‘NoneType’ object is not iterable
conn.transport.sock.setblocking(False) while True: try: while True: # until the exception ch.wait() except TypeError: pass <block till activity on conn.transport.sock>
At this point modifying the code to became a valid AWSGI application is straightforward. Full code can be found in the EvServer examples directory. Here’s the simplified version:
def wsgi_subscribe(environ, start_response): start_response(“200 OK”, [('Content-type','text/plain')]) msgs = [] def callback(msg): msgs.append(msg.body) msg.channel.basic_ack(msg.delivery_tag) <setup connection, channel, queue> conn.transport.sock.setblocking(False) try: while ch.callbacks: try: while True: # until exception ch.wait() except (TypeError,), e: pass yield ‘got messages: %r\n’ % (msgs,) while msgs: msgs.pop() # empty the queue # block! yield environ['x-wsgiorg.fdevent.readable'](conn.transport.sock) except GeneratorExit: pass
Introducing the Comet library
Michael Carter has figured out how to make Comet - long lasting HTTP push connection - work on all major browsers. As a part of EvServer I distribute a simple javascript Comet library based on his work. The basic API consists of one method that creates a comet channel: comet_connection(url, callback).
<script src=”./static/comet.js” type=”text/javascript”></script> <script> function user_callback(data){ alert(‘got message: ‘ + data); } close_comet_function = comet_connection(url, user_callback); </script>
There are many different encapsulation types for Comet messages and the format of the messages emitted from the server depends sensitively on the browser type. Fortunately the WSGI application can remain quite simple by using the evserver.transports wrapper:
import evserver.transports as transports
def simplest_comet_application(environ, start_response): t = transports.get_transport(<comet transport family>) start_response(‘200 OK’, t.get_headers()) yield t.start() yield t.write(‘fist message!’) yield t.write(’second message!’)
Please, don’t build a chat
I think we’re all sick of yet-another-chat examples. I propose to build something similar, but instead of broadcasting chat messages let’s broadcast user-agent and referrer http fields of people who view the site.
The html site is going to be dead simple.
On the server side we will serve comet.js file and the main HTML file. While serving the HTML, we’ll send an AMQP message. We also need to create a Comet channel URL. Let’s use Django - in total we need three Django views, one of which can be generic.
Deployment
We need: python2.5, rabbitmq, evserver, py-amqplib-0.6, django and my django project code.
sudo apt-get install erlang-nox setuptools subversion
wget http://www.rabbitmq.com/releases/rabbitmq-server/v1.5.1/rabbitmq-server_1.5.1-1_all.deb
sudo dpkg -i rabbitmq-server_1.5.1-1_all.deb
wget http://evserver.googlecode.com/files/<current-evserver-version>
sudo easy_install evserver-*.egg
wget http://barryp.org/static/software/download/py-amqplib/0.6/amqplib-0.6.tgz
tar xvzf amqplib-0.6.tgz
cd amqplib-0.6
sudo python setup.py install
cd ..
wget http://www.djangoproject.com/download/1.0.2/tarball/tar xzvf Django-1.0.2-final.tar.gz
cd Django-1.0.2-final
sudo python setup.py install
cd ..
svn checkout http://evserver.googlecode.com/svn/trunk/evserver/examples/django_agentpush django_agentpush
After these steps, to run the code just type:
cd django_agentpush
./manage.py runevserver
or
PYTHONPATH=”..” DJANGO_SETTINGS_MODULE=django_agentpush.settings evserver –framework=django
Currently you should be able to view online the working code(outdated) (source code is here).
Summary
I’ve shown how to build a comet application that waits for messages from RabbitMQ. Using these tools you should be able to build ambitious real-time collaboration web applications, like EtherPad. I’m starting to believe that the hardest part in real-time web apps is the javascript.
LShift is recruiting! (thanks to all of you that already have contacted us, we’re still reviewing your CVs)
Erlang/OTP’s global module helps with atomic assignment of names for processes in a distributed Erlang cluster. It makes sure that only a single process at a time holds any given name, across all connected nodes. Unlike the local name registration function, names aren’t limited to being atoms: with global, they can be any term at all.
To see global’s conflict-resolution in action, we need to register a name on two nodes not initially connected, and then make them aware of each other. The system will pick one registration to survive, and will terminate the other registration.
First, register the name “a” on each of two nodes (started with erl -sname one and erl -sname two, respectively). On node one:
Eshell V5.6.2 (abort with ^G) (one@walk)1> global:register_name(a, self()). yes (one@walk)2> global:whereis_name(a). <0.37.0>
We see that the name was registered successfully (the call to register_name returned yes), and that when looked up, a pid (the pid of the shell process) is returned, as we would expect. Now, the same on node two:
Eshell V5.6.2 (abort with ^G) (two@walk)1> global:register_name(a, self()). yes (two@walk)2> global:whereis_name(a). <0.37.0>
Again, we see it succeeding. Note that each node has successfully registered the “global” name “a”. This is because they are unaware of each other. Once they’re connected, Erlang/OTP will automatically resolve the situation. By default, it does this by terminating one of the two contending processes.
Let’s see what happens. Connect the two nodes together, by pinging one from the other — here, pinging node two from node one:
(one@walk)3> net_adm:ping(two@walk).
pong
(one@walk)4>
=INFO REPORT==== 13-Feb-2009::03:05:22 ===
global: Name conflict terminating {a,<5744.37.0>}
(one@walk)4> global:whereis_name(a).
<0.37.0>
(one@walk)5>
See that the termination of one of the contenders is reported with a message in the system log. It was the registration on node two that was terminated, and the registration on node one that survived. Here’s what we see on node two:
** exception error: killed (two@walk)3> global:whereis_name(a). <5768.37.0> (two@walk)4> node(global:whereis_name(a)). one@walk
Node two’s registered process has been killed. When we then ask about the registration for the name “a”, we see a pid from node one.
Finally, we’ll try registering the name for a second time:
(two@walk)5> global:register_name(a, self()). no (two@walk)6>
It answers no because there’s already a registration that it knows about in the system. The same no answer would have been returned if we’d tried the same thing on node one instead.
I tend to think of Oracles BDB java edition as the solution to all storage performance problems: The least performance sacrifice you can possibly make to get ACID storage in java. So when as our automated configuration import for Magnolia got slower and slower, it was naturally what I turned to.
Its particularly suited in this case, because Jackrabbit has a very simple contract for a persistence manager to implement, and handles indexing (and hence searching) separately, so there’s no need to handle query optimization.
Unfortunately, it didn’t immediately prove to be reliable, and I had to shelve it until I had some more time to work on it. In fact, the problems turn out to be simple - they just benefited from fresh eyes.
I haven’t wanted to delve much into Jackrabbit, so I have ignored how it uses the persistence manager. PersistenceManager.store(ChangeLog) does all its work in a transaction. This is probably enough to make Jackrabbit ACID, however either Magnolia or Jackrabbit have this wrong - you can certainly leave your Magnolia instance in an inconsistent state if you interrupt the application.
I’ve mostly noted this as various strategies to get the BDB environment shut down properly have failed. You want to share an environment between multiple workspaces, because Magnolia sets up lots of them, and BDB’s cache would be very inefficient if it was split up into lots of little segments. Its not easy to use finalize to clean up the environment - I use the collections interface which I have to pass the environment to directly, and of course BDB runs several threads, which mean an environment won’t be garbage collected until its closed. In the end, I’ve settled on tracking the set of workspaces using the environment. I also have a fall-back - a ServletContextListener that will shut down any open environments when the servlet context shuts down.
I still need to write some decent documentation for this, but there is the mercurial repository, if you want to try it out.Its a maven project.
It is fast, but I haven’t measured the performance. Because of the above mentioned problems with Magnolia and transactions, you may not want to use it for your production magnolia author instance, but its great for developers and public instances. A remotely accessible service would be at least as reliable as using an external DBMS. Its already no worse than using an embedded DB or other embedded persistence manager.
Long long time ago there was a WSGI spec. This document described a lot of interesting stuff. Between other very important paragraphs you could find a hidden gem:
[...] applications will usually return an iterator (often a generator-iterator) that produces the output in a block-by-block fashion. These blocks may be broken to coincide with mulitpart boundaries (for “server push”), or just before time-consuming tasks (such as reading another block of an on-disk file). [...]
def clock_demo(environ, start_response): start_response(“200 OK”, [('Content-type','text/plain')]) for i in range(100): yield “%s\n” % (datetime.datetime.now(),) time.sleep(1)The problem is that way of programming just doesn’t work well. It’s not scalable, requires a lot of threads and can eat a lot of resources. That’s why the feature has been forgotten.
def clock_demo(environ, start_response): start_response(“200 OK”, [('Content-type','text/plain')]) sd = socket.socket(socket.AF_INET, socket.SOCK_DGRAM) try: for i in range(100): yield environ['x-wsgiorg.fdevent.readable'](sd, 1.0) yield “%s\n” % (datetime.datetime.now(),) except GeneratorExit: pass sd.close()So I created a server that supports it: EvServer the Asynchronous Python WSGI Server
| Server |
Fetches/sec |
| evserver | 4254 |
| spawning with threads | 1237 |
| spawning without threads | 2200 |
| cherrypy wsgi server | 1700 |
Examples
Admittedly using raw WSGI for regular web applications is a bit inconvenient. Fortunately decent web frameworks support passing iterators from the web application down to the WSGI server, throughout all the framework. On my list of frameworks that support iterators you can find: Django and Web.py.
Django
Django 1.0 supports returning iterators from views. This is Django code for the clock example:
def django_clock(request): def iterator(): sd = socket.socket(socket.AF_INET, socket.SOCK_DGRAM) try: while True: yield request.environ['x-wsgiorg.fdevent.readable'](sd, 1.0) yield ‘%s\n’ % (datetime.datetime.now(),) except GeneratorExit: pass sd.close() return HttpResponse(iterator(), mimetype=“text/plain”)The problem is that this code is not going to work using the standard ./manage runserver development server. Fortunately, it’s very easy to integrate EvServer with Django, you only need to put that into settings.py:
INSTALLED_APPS = ( [...] ‘django.contrib.sites’, ‘evserver’, # <<< THIS LINE enables runevserver command)Now you can test your app using ./manage runevserver.
class webpy_clock: def GET(self, name): web.header(‘Content-Type’,‘text/plain’, unique=True) environ = web.ctx.environ def iterable(): sd = socket.socket(socket.AF_INET, socket.SOCK_DGRAM) # any udp socket try: while True: yield environ['x-wsgiorg.fdevent.readable'](sd, 1.0) yield “%s\n” % (datetime.datetime.now(),) except GeneratorExit: pass sd.close() return iterable()The full source code is included in EvServer example directory . You can run this code using command:
evserver --exec "import examples.framework_webpy; application = examples.framework_webpy.application"
Summary
I haven’t discussed any useful scenario yet, I’ll try to do that in the future post. I’m thinking of some interesting uses for EvServer - pushing the data to the browser using COMET.
LShift is recruiting!
You are currently browsing the LShift Ltd. blog archives for February, 2009.