As part of a customer project some years ago, we wrote an implementation of the interesting parts of RFC 3339 for Python. The abstract for the RFC says
This document defines a date and time format for use in Internet protocols that is a profile of the ISO 8601 standard for representation of dates and times using the Gregorian calendar.
We needed to be able to robustly transfer timestamps between languages (Javascript and Python, chiefly) without getting tangled up in timezone troubles or complex ambiguous parsing problems.
Our code provides
tzinfo class and singleton instancetzinfo classExamples
These examples are taken from the doctests/docstrings in the module source itself. See the module documentation for many more informative examples.
Parsing a timestamp, with timezone support and timestamp equivalence:
>>> midnightUTC = parse_datetime("2008-08-24T00:00:00Z")
>>> oneamBST = parse_datetime("2008-08-24T01:00:00+01:00")
>>> midnightUTC == oneamBST
True
Printing a timestamp:
>>> oneamBST.isoformat()
'2008-08-24T01:00:00+01:00'
>>> parse_datetime("2008-08-24T00:00:00.123Z").isoformat()
'2008-08-24T00:00:00.123000+00:00'
Downloading the code
The code is available on github. It’s MIT-licensed.
You can also install the module directly from github using pip:
pip install -e git://github.com/tonyg/python-rfc3339.git#egg=rfc3339
My experience with SML/NJ has been almost uniformly positive, over the years. We used it extensively in a previous project to write a compiler (targeting the .NET CLR) for a pi-calculus-based language, and it was fantastic. One drawback with it, though, is the lack of documentation. Finding out how to (a) compile for and (b) use CML takes real stamina. I’ve only just now, after several hours poring over webpages, mailing lists, and library source code, gotten to the point where I have a running socket server.
The following example is comprised of a .cm file for building the program, and the .sml file itself. The complete sources:
Running the following command compiles the project:
ml-build test.cm Testprog.main
The ml-build output is a heap file, with a file extension dependent on your architecture and operating system. For me, right now, it produces test.x86-darwin. To run the program:
sml @SMLload=test.x86-darwin
substituting the name of your ml-build-produced heap file as necessary.
On Ubuntu, you will need to have run apt-get install smlnj libcml-smlnj libcmlutil-smlnj to ensure both SML/NJ and CML are present on your system.
The test.cm file contains
Group is
$cml/basis.cm
$cml/cml.cm
$cml-lib/smlnj-lib.cm
test.sml
which instructs the build system to use the CML variants of the basis and the standard SML/NJ library, as well as the core CML library itself and the source code of our program. For more information about the SML CM build control system, see here.
Turning to test.sml now, we first declare the ML structure (module) we’ll be constructing. The structure name is also part of one of the command-line arguments to ml-build above, telling it which function to use as the main function for the program.
structure Testprog = struct
Next, we bring the contents of the TextIO module into scope. This is necessary in order to use the print function with CML; if we use the standard version of print, the output is unreliable. The special CML variant is needed. We also declare a local alias SU for the global SockUtil structure.
open TextIO
structure SU = SockUtil
ML programs end up being written upside down, in a sense, because function definitions need to precede their use (unless mutually-recursive definitions are used). For this reason, the next chunk is connMain, the function called in a new lightweight thread when an inbound TCP connection has been accepted. Here, it simply prints out a countdown from 10 over the course of the next five seconds or so, before closing the socket. Multiple connections end up running connMain in independent threads of control, leading automatically to the natural and obvious interleaving of outputs on concurrent connections.
fun connMain s =
let fun count 0 = SU.sendStr (s, "Bye!\r\n")
| count n = (SU.sendStr (s, "Hello " ^ (Int.toString n) ^ "\r\n");
CML.sync (CML.timeOutEvt (Time.fromReal 0.5));
count (n - 1))
in
count 10;
print "Closing the connection.\n";
Socket.close s
end
The function that depends on connMain is the accept loop, which repeatedly accepts a connection and spawns a connection thread for it.
fun acceptLoop server_sock =
let val (s, _) = Socket.accept server_sock
in
print "Accepted a connection.\n";
CML.spawn (fn () => connMain(s));
acceptLoop server_sock
end
The next function is the primordial CML thread, responsible for creating the TCP server socket and entering the accept loop. We set SO_REUSEADDR on the socket, listen on port 8989 with a connection backlog of 5, and enter the accept loop.
fun cml_main (program_name, arglist) =
let val s = INetSock.TCP.socket()
in
Socket.Ctl.setREUSEADDR (s, true);
Socket.bind(s, INetSock.any 8989);
Socket.listen(s, 5);
print "Entering accept loop...\n";
acceptLoop s
end
Finally, the function we told ml-build to use as the main entry point of the program. The only thing we do here is disable SIGPIPE (otherwise we get rudely killed if a remote client’s socket closes!) and start CML’s scheduler running with a primordial thread function. When the scheduler decides that everything is over and the program is complete, it returns control to us. (The lone end closes the struct definition way back at the top of the file.)
fun main (program_name, arglist) =
(UnixSignals.setHandler (UnixSignals.sigPIPE, UnixSignals.IGNORE);
RunCML.doit (fn () => cml_main(program_name, arglist), NONE);
OS.Process.success)
end
I’ve just released Snarl, a Growl-like notification system for Squeak. To use it,
Snarl label: 'Something happened'
body: 'What could it have been?'
I’ve recorded a quick demo:
(It’s pretty blurry, so I’ve uploaded it to vimeo too, but it’s still in the queue for conversion; when it’s converted, it’ll be here.)
The code is three classes: one tiny convenience class, Snarl; one TextMorph subclass, which does almost all the work; and one helper TextAttribute subclass, for fading out coloured text along with the rest of each notification. In total, it’s 205 lines of text, including documentation.
Recently, as part of a Seaside-based application running within Squeak, I wanted to send HTML-formatted notification emails when certain things happened within the application.
It turns out that Squeak has a built-in SMTP client library, which with a small amount of glue can be used with Seaside’s HTML renderer to send HTML formatted emails using code similar to that used when rendering Seaside website components.
sendHtmlEmailTo: toEmailAddressString
from: fromEmailAddressString
subject: subjectString
with: aBlock
| m b bodyHtml |
m := MailMessage empty.
m setField: 'from' toString: fromEmailAddressString.
m setField: 'to' toString: toEmailAddressString.
m setField: 'subject' toString: subjectString.
m setField: 'content-type' toString: 'text/html'.
b := WAHtmlBuilder new.
b canvasClass: WARenderCanvas.
b rootClass: WAHtmlRoot.
bodyHtml := b render: aBlock.
m body: (MIMEDocument contentType: 'text/html' content: bodyHtml).
SMTPClient deliverMailFrom: m from
to: {m to}
text: m asSendableText
usingServer: 'YOUR.SMTP.SERVER.EXAMPLE.COM'.
The
argument should be like the body of a WAComponent’s
method. Here’s an example:
whateverObjectYouInstalledTheMethodOn
sendHtmlEmailTo: 'target@example.com'
from: 'source@example.org'
subject: 'Hello, world'
with: [:html |
html heading level3 with: 'This is a heading'.
html paragraph with: 'Hi there!']
On the 9th, last Thursday, I spoke at the Online Gaming High Scalability SIG at Skills Matter. The talk covered
The slides are available for download here (PDF – with notes), and are also available on SlideShare:
RabbitHub is our implementation of PubSubHubBub, a straightforward pubsub layer on top of plain old HTTP POST — pubsub over Webhooks. It’s not well documented yet (understatement), but that will change.
It gives every AMQP exchange and queue hosted by a RabbitMQ broker a couple of URLs: one to use for delivering messages to the exchange or queue, and one to use to subscribe to messages forwarded on by the exchange or queue. You subscribe with a callback URL, so when messages arrive, RabbitHub POSTs them on to your callback. For example,
http://dev.rabbitmq.com/rabbithub/endpoint/x/amq.direct is the URL for delivering messages to the “amq.direct” exchange on our public test instance of RabbitMQ, and
http://dev.rabbitmq.com/rabbithub/subscribe/q/some_queue_name is the URL for subscribing to messages from the (hypothetical) queue “some_queue_name” on the broker.
(The symmetrical …/subscribe/x/… and …/endpoint/q/… also exist.)
The PubSubHubBub protocol specifies some RESTful(ish) operations for establishing subscriptions between message sources (a.k.a “topics”) and message sinks. RabbitHub implements these operations as well as a few more for RESTfully creating and deleting exchanges and queues.
Combining RabbitHub with the AMQP protocol implemented by RabbitMQ itself and with the other adapters and gateways that form part of the RabbitMQ universe lets you send messages across different kinds of message networks — for example, our public RabbitMQ instance, dev.rabbitmq.com, has RabbitHub running as well as the standard AMQP adapter, the rabbitmq-xmpp plugin, and a bunch of our other experimental stuff, so you can do things like this:

become XMPP friends with pshb@dev.rabbitmq.com (the XMPP adapter gives each exchange a JID of its own)
use PubSubHubBub to subscribe the sink http://dev.rabbitmq.com/rabbithub/endpoint/x/pshb to some PubSubHubBub source — perhaps one on the public Google PSHB instance. (Note how the given URL ends in “x/pshb”, meaning the “pshb” exchange — which lines up with the JID we just became XMPP friends with.)
wait for changes to be signalled by Google’s PSHB hub to RabbitHub
when they are, you get an XMPP IM from pshb@dev.rabbitmq.com with the Atom XML that the hub sent out as the body
RabbitHub is content-agnostic — you don’t have to send Atom around — so the fact that Atom appears is an artifact of what Google’s public PSHB instance is mailing out, rather than anything intrinsic in pubsub-over-webhooks.
We’ve also been experimenting with using http://www.reversehttp.net/ to run a PubSubHubBub endpoint in a webpage — see for instance http://www.reversehttp.net/demos/endpoint.html and its associated Javascript for a simple prototype of the idea. I’m playing with building a simple PSHB hub in Javascript using the same tools.
Many websites refuse to accept email addresses of the form myusername+sometext@gmail.com, despite the fact that the +sometext is perfectly legitimate1 and is an advertised feature gmail offers for creating pseudo-single-use email addresses from a base email address.
My guess is that the developers of these sites think, because they’re either lazy or incompetent, that email addresses have more restrictions than they in fact have. It’s reasonable (and fairly easy) these days to check the syntax of the DNS part of an email address, because few people use non-DNS or non-SMTP transfer methods anymore, but the mailbox part is extremely flexible and hard to check accurately. A sane thing to do is just trust the user, and send a test mail to validate the address.
I picked on Yahoo in the title of this post: Yahoo are by no means the only offender, but I just signed up for a yahoo account, so they’re for me the most recent. Their signup form also refused to provide any guidance about why they were rejecting the form submission: I had to use my previous experience of sites wrongly rejecting valid email addresses to guess what the problem might be. Fail.
Footnote 1: According to my best reading of the relevant RFCs, anyway. See the definition of dot-atom in section 3.2.4 of RFC 2822, referenced in this context by section 3.4.1.
OpenAMQ has released their JMS client for using JMS with AMQP-supporting brokers. This afternoon I experimented with getting it running with RabbitMQ.
After a simple, small patch to the JMS client code, to make it work with the AMQP 0-8 spec that RabbitMQ implements (rather than the 0-9 spec that OpenAMQ implements), the basic examples shipped with the JMS client library seemed to work fine. The devil is no doubt in the details, but no problems leapt out at me.
To get it going, I checked it out using Git (git clone
git://github.com/pieterh/openamq-jms.git). Compilation was as simple as running ant. Kudos to the OpenAMQ team for making the build process so smooth! (Not to mention writing a great piece of software :-) )
The changes to make it work with AMQP 0-8 were:
retrieving the 0-8 specification XML
changing the JMS client library’s build.xml file to point to the downloaded file in its generate.spec variable
changing one line of code in src/org/openamq/client/AMQSession.java: in 0-8, the final null argument to BasicConsumeBody.createAMQFrame must be omitted
re-running the ant build
After this, and creating a /test virtual-host using RabbitMQ’s rabbitmqctl program, the OpenAMQ JMS client examples worked fine, as far as I could tell.
rabbitmqctl add_vhost /test
rabbitmqctl set_permissions -p /test guest '.*' '.*' '.*'
You can download the patch file I applied to try it yourself. Note that you’ll need to put the correct location to your downloaded amqp0-8.xml file into build.xml.
I’ve been working recently on Reverse HTTP, an approach to making HTTP easier to use as the distributed object system that it is. My work is similar to the work of Lentczner and Preston, but is independently invented and technically a bit different: one, I’m using plain vanilla HTTP as a transport, and two, I’m focussing a little more on the enrollment, registration, queueing and management aspects of the system. My draft spec is here (though as I’m still polishing, please excuse its roughness), and you can play with some demos or download and play with an implementation of the spec.
Comments welcome!
HTTP/1.1 is a lovely protocol. Text-based, sophisticated, flexible. It does tend toward the verbose though. What if we wanted to use HTTP’s semantics in a very high-speed messaging situation? How could we mitigate the overhead of all those headers?
Now, bandwidth is pretty cheap: cheap enough that for most applications the kind of approach I suggest below is ridiculously far over the top. Some situations, though, really do need a more efficient protocol: I’m thinking of people having to consume the OPRA feed, which is fast approaching 1 million messages per second (1, 2, 3). What if, in some bizarre situation, HTTP was the protocol used to deliver a full OPRA feed?
Instead of having each HTTP request start with a clean slate after the previous request on a given connection has been processed, how about giving connections a memory?
Let’s invent a syntax for HTTP that is easy to translate back to regular HTTP syntax, but that avoids repeating ourselves quite so much.
Each line starts with an opcode and a colon. The rest of the line is interpreted depending on the opcode. Each opcode-line is terminated with CRLF.
V:HTTP/1.x Set HTTP version identifier.
B:/some/base/url Set base URL for requests.
M:GET Set method for requests.
<:somename Retrieve a named configuration
>:somename Give the current configuration a name
H:Header: value Set a header
-:/url/suffix Issue a bodyless request
+:/url/suffix 12345 Issue a request with a body
Opcodes V, B, M and H are hopefully self-explanatory. I’ll
explore < and > below. The opcodes - and + actually complete
each request and tell the server to process the message.
Opcode - takes as its argument a URL fragment that gets appended to
the base URL set by opcode B. Opcode + does the same, but also
takes an ASCII Content-Length value, which tells the server to read
that many bytes after the CRLF of the + line, and to use the bytes
read as the entity body of the HTTP request.
Content-Length is a slightly weird header, more properly associated
with the entity body than the headers proper, which is why it gets
special treatment. (We could also come up with a syntax for indicating
chunked transfer encoding for the entity body.)
As an example, let’s encode the following POST request:
POST /someurl HTTP/1.1
Host: relay.localhost.lshift.net:8000
Content-Type: text/plain
Accept-Encoding: identity
Content-Length: 13
hello world
Encoded, this becomes
V:HTTP/1.1
B:/someurl
M:POST
H:Host: relay.localhost.lshift.net:8000
H:Content-Type: text/plain
H:Accept-Encoding: identity
+: 13
hello world
Not an obvious improvement. However, consider issuing 100 copies of that same request on a single connection. With plain HTTP, all the headers are repeated; with our encoded HTTP, the only part that is repeated is:
+: 13
hello world
Instead of sending (151 * 100) = 15100 bytes, we now send 130 + (20 * 100) = 2130 bytes.
The scheme as described so far takes care of the unchanging parts of
repeated HTTP requests; for the changing parts, such as Accept and
Referer headers, we need to make use of the < and >
opcodes. Before I get into that, though, let’s take a look at how the
scheme so far might work in the case of OPRA.
Each OPRA quote update is on average 66 bytes long, making for around 63MB/s of raw content.
Let’s imagine that each delivery appears as a separate HTTP request:
POST /receiver HTTP/1.1
Host: opra-receiver.example.com
Content-Type: application/x-opra-quote
Accept-Encoding: identity
Content-Length: 66
blablablablablablablablablablablablablablablablablablablablablabla
That’s 213 bytes long: an overhead of 220% over the raw message content.
Encoded using the stateful scheme above, the first request appears on the wire as
V:HTTP/1.1
B:/receiver
M:POST
H:Host: opra-receiver.example.com
H:Content-Type: application/x-opra-quote
H:Accept-Encoding: identity
+: 66
blablablablablablablablablablablablablablablablablablablablablabla
and subsequent requests as
+: 66
blablablablablablablablablablablablablablablablablablablablablabla
for an amortized per-request size of 73 bytes: a much less problematic overhead of 11%. In summary:
| Encoding | Bytes per message body | Per-message overhead (bytes) | Size increase over raw content | Bandwidth at 1M msgs/sec |
|---|---|---|---|---|
| Plain HTTP | 66 | 147 | 220% | 203.1 MBy/s |
| Encoded HTTP | 66 | 7 | 11% | 69.6 MBy/s |
Using plain HTTP, the feed doesn’t fit on a gigabit ethernet. Using our encoding scheme, it does.
Besides the savings in terms of bandwidth, the encoding scheme could also help with saving CPU. After processing the headers once, the results of the processing could be cached, avoiding unnecessary repetition of potentially expensive calculations such as routing, authentication, and authorisation.
Above, I mentioned that some headers changed, while others stayed the
same from request to request. The < and > opcodes are intended to
deal with just this situation.
The > opcode stores the current state in a named register, and the
< opcode loads the current state from a register. Headers that don’t
change between requests are placed into a register, and each request
loads from that register before setting its request-specific headers.
To illustrate, imagine the following two requests:
GET / HTTP/1.1
Host: www.example.com
Cookie: key=value
Accept: HTTP Accept=text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
GET /style.css HTTP/1.1
Host: www.example.com
Cookie: key=value
Referer: http://www.example.com/
Accept: text/css,*/*;q=0.1
One possible encoding is:
V:HTTP/1.1
B:/
M:GET
H:Host: www.example.com
H:Cookie: key=value
>:config1
H:Accept: HTTP Accept=text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
-:
<:config1
H:Referer: http://www.example.com/
H:Accept: text/css,*/*;q=0.1
-:style.css
By using <:config1, the second request reuses the stored settings
for the method, base URL, HTTP version, and Host and Cookie
headers.
Most applications of HTTP do fine using ordinary HTTP syntax. I’m not suggesting changing HTTP, or trying to get an encoding scheme like this deployed in any browser or webserver at all. The point of the exercise is to consider how low one might make the bandwidth overheads of a text-based protocol like HTTP for the specific case of a high-speed messaging scenario.
In situations where the semantics of HTTP make sense, but the syntax is just too verbose, schemes like this one can be useful on a point-to-point link. There’s no need for global support for an alternative syntax, since people who are already forming very specific contracts with each other for the exchange of information can choose to use it, or not, on a case-by-case basis.
Instead of specifying a whole new transport protocol for high-speed links, people can reuse the considerable amount of work that’s gone into HTTP, without paying the bandwidth price.
Just as a throwaway comparison, I computed the minimum possible
overhead for sending a 66-byte message using AMQP 0-8 or 0-9. Using a
single-letter queue name, “q“, the overhead is 69 bytes per message,
or 105% of the message body. For our OPRA example at 1M messages per
second, that works out at 128.7 megabytes per second, and we’re back
over the limit of a single gigabit ethernet again. Interestingly,
despite AMQP’s binary nature, its overhead is much higher than a
simple syntactic rearrangement of a text-based protocol in this case.
We considered the overhead of using plain HTTP in a high-speed messaging scenario, and invented a simple alternative syntax for HTTP that drastically reduces the wasted bandwidth.
For the specific example of the OPRA feed, the computed bandwidth requirement of the experimental syntax is only 11% higher than the raw data itself — nearly 3 times less than ordinary HTTP.