Posts filed under 'Standards'

E4X: Not as awful as I thought

Long, long ago, I complained about various warts and infelicities in E4X, the ECMAScript extensions for generating and pattern-matching XML documents. It turns out that two of my complaints were not well-founded: sequence-splicing is supported, and programmatic construction of tags is possible.

Firstly (and I’m amazed I didn’t realise this at the time, as I was using it elsewhere), it’s not a problem at all to splice in a sequence of items, in the manner of Scheme’s unquote-splicing; here’s a working solution to the problem I set myself:

function buildItems() {
  return <>
           <item>Hello</item>
           <item>World!</item>
         </>;
}
var doc = <mydocument>{buildItems()}</mydocument>;

You can even use real Arrays (which is what I tried and failed to do earlier), by guerilla-patching Array.prototype:

Array.prototype.toXMLList = function () {
    var x = <container/>;
    for (var i = 0; i < this.length; i++) {
        x.appendChild(this[i]);
    }
    return x.children();
}
function buildItems() {
    return [<item>Hello</item>,
            <item>World!</item>].toXMLList();
}
var doc = <mydocument>{buildItems()}</mydocument>;

Programmatic construction of tags is done by use of the syntax for plain old unquote, in an unusual position: inside the tag’s angle-brackets:

var tagName = "p";
var doc = <{tagName}>test</{tagName}>;

So in summary, my original expectation that E4X should turn out to be very quasiquote-like wasn’t so far off the mark. It’s enough to get the basics done (ignoring for the minute the problems with namespace prefixes), but it’s still a bit of a bolt-on afterthought; it would have been nice to see it better integrated with the rest of the language.

2 comments May 7th, 2008 tonyg

Astral Plane characters in Erlang JSON/RFC4627 implementation

Sam Ruby examines support for astral-plane characters in various JSON implementations. His post prompted me to check my Erlang implementation of rfc4627. I found that for astral plane characters in utf-8, utf-16, or utf-32, everything worked properly, but the RFC4627-mandated surrogate-pair “\uXXXX” encodings broke. A few minutes hacking later, and:

Eshell V5.5.5  (abort with ^G)
1> {ok, Utf8Encoded, []} =
        rfc4627:decode("\"\\u007a\\u6c34\\ud834\\udd1e\"").
{ok,<<122,230,176,180,240,157,132,158>>,[]}
2> xmerl_ucs:from_utf8(Utf8Encoded).
[122,27700,119070]
3> rfc4627:encode(Utf8Encoded).
[34,122,230,176,180,240,157,132,158,34]
4> 

Much better.

You can get the updated code using darcs:

darcs get http://www.lshift.net/~tonyg/erlang-rfc4627/

Add comment November 16th, 2007 tonyg

Invitation to AMQP and RabbitMQ Birds of a Feather session

I am guest blogging here on behalf of CohesiveFT. We work with the excellent LShift team on our joint venture, RabbitMQ.

I’m here to invite you to a Birds of a Feather session this coming Thursday, August 30th, at 8pm, in central London. It is FREE and will last for 45 minutes starting at 8pm, followed by the traditional breakout discussions over a beer. Please do take a look at RabbitMQ if you have not yet done so. It’s a commercial open source product, available under the MPL 1.1 and implementing the Advanced Message Queue Protocol. AMQP is a new way to do business messaging (ie: “what goes in, must come out“). What’s really cool is that like HTTP it is a protocol instead of a language specific API. This should make interoperability between platforms much easier and less painful (business readers: “systems integration projects take less time and success can be predicted more accurately”). For more information, please see my list of links here.

What is the BOF about - and why come? It’s an informal session about RabbitMQ and AMQP, and how they apply within popular environments such as Spring, Mule, Ruby, AJAX, and other messaging protocols such as FIX.

“Informal” means we’ll be encouraging a conversation between people interested in any of these things. We want to hear from you, and from each other, rather than pushing slideware at people.

Come if you want to:

You can find out details of the BOF here. Ideally we ask you to register via the web site, but late arrivals are very welcome - if you turn up, we shall get you in. The BOF is offered as part of the popular EJUG series of tech talks and as a tie-in with the most excellent No Fluff Just Stuff conference.

If you cannot come but want to know more about any of these things then you can email us at info@rabbitmq.com.

Thank-you very much - and we hope to see you on Thursday :-)

Posted by Chris on behalf of Alexis Richardson, CohesiveFT.

2 comments August 28th, 2007 chris

RFC 1982 limits itself to powers of two unnecessarily

RFC 1982 defines a “Serial Number Arithmetic”, for use when you have a fixed number of bits available for some monotonically increasing sequence identifier, such as the DNS SOA record serial number, or message IDs in some messaging protocol. It defines all its operations with respect to some power of two, (2^SERIAL_BITS). It struck me just now that there’s no reason why you couldn’t generalise to any number that simply has two as a factor. You’d simply replace any mention of (2^SERIAL_BITS) by, say, N, and any mention of (2^(SERIAL_BITS-1)) by (N/2). The definitions for addition and comparison still seem to hold just as well.

One of the reasons I was thinking along these lines is that in Erlang, it’s occasionally useful to model a queue in an ETS table or in a process dictionary. If one didn’t mind setting an upper bound on the length of one’s modelled queue, then by judicious use of RFC 1982-style sequence number wrapping, one might ensure that the space devoted to the sequence numbering required of the model remained bounded. Using a generalised variant of RFC 1982 arithmetic, one becomes free to choose any number as the queue length bound, rather than any power of two.

2 comments February 17th, 2007 tonyg

HTML Email is hard to get right

For a recent project, we developed support for sending automatically-generated HTML emails. Now, most people do this by including a message body with MIME-type text/html. For extra points, sometimes there’s also a text/plain part alongside the HTML in a multipart/alternative container.

The problem with doing things this way is that you can’t include any images or other resources (such as CSS) as separate parts of the email linked to from the main HTML body-part. For that, you need to use the multipart/related MIME-type. Unfortunately, few commonly-used email clients render multipart/related HTML-plus-resource aggregations well.

We only tried the arrangement where the multipart/related, containing the main HTML page and its associated resources, was a sibling of the text/plain part within the multipart/alternative container. The inverse arrangement, with the multipart/alternative as the main document within the multipart/related part, was something we have yet to experiment with.

Here’s a picture of the structure of our initial attempts:

multipart/alternative
 |
 +-- text/plain
 +-- multipart/related
      |
      +-- text/html
      +-- image/gif
      +-- text/css

This worked reasonably well in Thunderbird and Outlook 2002, but we had consistent reports from our customer that the images and stylesheet would randomly fail to display in Outlook 2003 (SP2). After lots of mucking around trying to get Outlook to either work reliably or fail reliably, we gave up on that line and instead simplified the structure of our emails, putting the CSS styling inline in the HTML HEAD element:

multipart/alternative
 |
 +-- text/plain
 +-- multipart/related
      |
      +-- text/html (with text/css inline in HEAD)
      +-- image/gif

This didn’t work particularly well, either: it seems many email clients ignore styles set in the HEAD element. Finally, we moved to applying CSS styling inline, using a style attribute on each styled element. We were able to use an XSLT transformation to allow us to write clean HTML and apply the CSS style attribute automatically. The final structure of the emails we sent:

multipart/alternative
 |
 +-- text/plain
 +-- multipart/related
      |
      +-- text/html (with text/css copied on to each element!)
      +-- image/gif

This seems to work more-or-less reliably across

  • Thunderbird
  • Outlook 2002
  • Outlook 2003 SP2
  • Google Gmail
  • MS Hotmail

If I was to do it all again, I’d give serious consideration to the traditional non-multipart text/html solution with images hosted by some public-facing web server. We managed to get our multipart-HTML-emails working acceptably, but only by the skin of our teeth.

References:

2 comments July 18th, 2006 tonyg

E4X: I want my S-expressions back

E4X is a new ECMA standard (ECMA-357) specifying an extension to ECMAScript for streamlining work with XML documents.

It adds objects representing XML to ECMAScript, and extends the syntax to allow literal XML fragments to appear in code. It also supports a very XPath-like notation for use in extracting data from XML objects. So far, so good - all these things are somewhat useful. However, there are serious problems with the design of the extension.

If E4X objects were real objects, if there were a means of splicing a sequence of child nodes into XML literal syntax, and if E4X supported XML namespace prefixes properly, most of my objections would be dealt with. As it stands, the overall verdict is “clunky at best”.

These are my main complaints:

  • It doesn’t do anything like Scheme’s unquote-splicing, and so using E4X to produce XML objects is verbose, error-prone and dangerous in concurrent settings.

    There seems to be no way of splicing in a sequence of items - I’d like to do something like the following:

    function buildItems() {
      return [<item>Hello</item>,
              <item>World!</item>];
    }
    var doc = <mydocument>{buildItems()}</mydocument>;
    

    and have doc contain

    <mydocument>
      <item>Hello</item>
      <item>World!</item>
    </mydocument>
    

    What actually results is the more-or-less useless

    <mydocument>Hello,World!</mydocument>
    

    The closest I can get to the result I’m after is

    function buildItems(n) {
      n.mydocument += <item>Hello</item>;
      n.mydocument += <item>World!</item>;
    }
    var doc = <mydocument></mydocument>;
    buildItems(doc);
    
  • It’s full of redundant redundancy - it’s as verbose as XML, when you can do so much better.

  • There’s no toXML() method (or similar) for use in papering over the yawning chasm between the XML objects and the rest of the language: you can’t even make a Javascript object able to seamlessly render itself to XML.

  • The new types E4X introduces aren’t even proper objects - they’re a whole new class of primitive datum!

  • Because they’re not proper objects, you can’t extend the system. You ought to be able to implement to an interface and benefit from the language’s XPath searching and filtering operations. E4X is so close to offering a comprehension facility for Javascript, but it’s been short-sightedly restricted to a single class of non-extensible primitives.

  • You can’t even construct XML tags programmatically! If the name of the tag doesn’t appear literally in your Javascript code, you’re out of luck (unless you resort to eval…) [[Update: I was wrong about this - you can write <{expr}> and have the result of evaluating expr substituted into the tag.]]

  • E4X XML objects have no notion of namespace prefixes (which are required for quality implementations of XPath and anything to do with XML signatures). Prefixes only turn up in the API as a means of producing (namespaceURI,localname) pairs. This is actually how it should be, but because there’s already broken software out there that depends on prefix support, by not supporting prefixes properly you preclude ECMAScript+E4X from being used for XML signatures or ECMAScript-native XPath implementations.

In my opinion, E4X violates several programming language design principles: most importantly, those of regularity, simplicity and orthogonality, but also preservation of information, automation and structure. SXML, perhaps in combination with eager comprehensions, provides a far superior model for producing and consuming XML. Sadly, there’s no real alternative for ECMAScript yet - we’re limited either to library extensions, or to using the DOM without any syntactic or library support at all.

5 comments June 24th, 2006 tonyg

XML tunnel-vision

In Ant 1.6, properties can be written in XML files. Can someone tell me why

<property name="some.property" value="some.value"/>

is more desirable than

property.name=some.value
?

Update The import feature is what’s new in Ant 1.6 that makes this usage possible. So, the answer is, “because you can conditionally set properties in the imported files” (rather than conditionally evaluating property files). So really my gripe is (again) with the weird, stilted little language that is Ant.

Add comment September 22nd, 2005 mikeb

Semantics in HTML via typographic convention

There’s a summary of an interesting discussion regarding semantics in HTML over on fantasai’s blog. Is the HR element only presentational or does it convey something about the content? It does seem to have a semantic role, but one which comes from it being a typographical convention of using a row of asterisks to mark context changes. The interesting bit is what comes after the discussion on HR (and SPAN and DIV); fantasai doesn’t say it explicitly, but Hixie’s proposed reforms to the HTML standard are in the same vein as with HR — they are clarifying what the element means by referring to what the typographical convention means. Compare the Web Applications 1.0 draft to the XHTML 2.0 draft.

1 comment August 8th, 2005 mikeb

Browser-side XForms

At last there appears to be a working implementation of XForms that is written in Javascript and runs entirely inside a browser: FormFaces. The clear separation of content from presentation, and the declarative nature of XForms have always appealed to me, but the lack of a browser-side implementation has so far put me off from using it. Perhaps it’s time for another try?

Add comment July 22nd, 2005 matthias

Calendar

May 2008
M T W T F S S
« Apr    
 1234
567891011
12131415161718
19202122232425
262728293031  

Posts by Month

Posts by Category