Posts filed under 'Standards'
Long, long ago, I complained about various warts and infelicities in E4X, the ECMAScript extensions for generating and pattern-matching XML documents. It turns out that two of my complaints were not well-founded: sequence-splicing is supported, and programmatic construction of tags is possible.
Firstly (and I’m amazed I didn’t realise this at the time, as I was using it elsewhere), it’s not a problem at all to splice in a sequence of items, in the manner of Scheme’s unquote-splicing; here’s a working solution to the problem I set myself:
function buildItems() {
return <>
<item>Hello</item>
<item>World!</item>
</>;
}
var doc = <mydocument>{buildItems()}</mydocument>;
You can even use real Arrays (which is what I tried and failed to do earlier), by guerilla-patching Array.prototype:
Array.prototype.toXMLList = function () {
var x = <container/>;
for (var i = 0; i < this.length; i++) {
x.appendChild(this[i]);
}
return x.children();
}
function buildItems() {
return [<item>Hello</item>,
<item>World!</item>].toXMLList();
}
var doc = <mydocument>{buildItems()}</mydocument>;
Programmatic construction of tags is done by use of the syntax for plain old unquote, in an unusual position: inside the tag’s angle-brackets:
var tagName = "p";
var doc = <{tagName}>test</{tagName}>;
So in summary, my original expectation that E4X should turn out to be very quasiquote-like wasn’t so far off the mark. It’s enough to get the basics done (ignoring for the minute the problems with namespace prefixes), but it’s still a bit of a bolt-on afterthought; it would have been nice to see it better integrated with the rest of the language.
May 7th, 2008
tonyg
Sam Ruby examines support for astral-plane characters in various JSON implementations. His post prompted me to check my Erlang implementation of rfc4627. I found that for astral plane characters in utf-8, utf-16, or utf-32, everything worked properly, but the RFC4627-mandated surrogate-pair “\uXXXX” encodings broke. A few minutes hacking later, and:
Eshell V5.5.5 (abort with ^G)
1> {ok, Utf8Encoded, []} =
rfc4627:decode("\"\\u007a\\u6c34\\ud834\\udd1e\"").
{ok,<<122,230,176,180,240,157,132,158>>,[]}
2> xmerl_ucs:from_utf8(Utf8Encoded).
[122,27700,119070]
3> rfc4627:encode(Utf8Encoded).
[34,122,230,176,180,240,157,132,158,34]
4>
Much better.
You can get the updated code using darcs:
darcs get http://www.lshift.net/~tonyg/erlang-rfc4627/
November 16th, 2007
tonyg
I am guest blogging here on behalf of CohesiveFT. We work with the excellent LShift team on our joint venture, RabbitMQ.
I’m here to invite you to a Birds of a Feather session this coming Thursday, August 30th, at 8pm, in central London. It is FREE and will last for 45 minutes starting at 8pm, followed by the traditional breakout discussions over a beer.
Please do take a look at RabbitMQ if you have not yet done so. It’s a commercial open source product, available under the MPL 1.1 and implementing the Advanced Message Queue Protocol. AMQP is a new way to do business messaging (ie: “what goes in, must come out“). What’s really cool is that like HTTP it is a protocol instead of a language specific API. This should make interoperability between platforms much easier and less painful (business readers: “systems integration projects take less time and success can be
predicted more accurately”). For more information, please see my list of links
here.
What is the BOF about - and why come? It’s an informal session about RabbitMQ and AMQP, and how they apply within popular environments such as Spring, Mule, Ruby,
AJAX, and other messaging protocols such as FIX.
“Informal” means we’ll be encouraging a conversation between people interested in any of these things. We want to hear from you, and from each other, rather than pushing slideware at people.
Come if you want to:
You can find out details of the BOF here. Ideally we ask you to register via the web site, but late arrivals are very welcome - if you turn up, we shall get you in. The BOF is offered as part of the popular EJUG series of tech talks and as a
tie-in with the most excellent No Fluff Just Stuff conference.
If you cannot come but want to know more about any of these things then you can email us at info@rabbitmq.com.
Thank-you very much - and we hope to see you on Thursday :-)
Posted by Chris on behalf of Alexis Richardson, CohesiveFT.
August 28th, 2007
chris
RFC 1982 defines a “Serial Number Arithmetic”, for use when you have a fixed number of bits available for some monotonically increasing sequence identifier, such as the DNS SOA record serial number, or message IDs in some messaging protocol. It defines all its operations with respect to some power of two, (2^SERIAL_BITS). It struck me just now that there’s no reason why you couldn’t generalise to any number that simply has two as a factor. You’d simply replace any mention of (2^SERIAL_BITS) by, say, N, and any mention of (2^(SERIAL_BITS-1)) by (N/2). The definitions for addition and comparison still seem to hold just as well.
One of the reasons I was thinking along these lines is that in Erlang, it’s occasionally useful to model a queue in an ETS table or in a process dictionary. If one didn’t mind setting an upper bound on the length of one’s modelled queue, then by judicious use of RFC 1982-style sequence number wrapping, one might ensure that the space devoted to the sequence numbering required of the model remained bounded. Using a generalised variant of RFC 1982 arithmetic, one becomes free to choose any number as the queue length bound, rather than any power of two.
February 17th, 2007
tonyg
For a recent project, we developed support for sending
automatically-generated HTML emails. Now, most people do this by
including a message body with MIME-type
text/html. For extra points, sometimes there’s also a
text/plain part alongside the HTML in a
multipart/alternative container.
The problem with doing things this way is that you can’t include any
images or other resources (such as CSS) as separate parts of the email
linked to from the main HTML body-part. For that, you need to use the
multipart/related
MIME-type. Unfortunately, few commonly-used email clients render
multipart/related HTML-plus-resource aggregations well.
We only tried the arrangement where the multipart/related,
containing the main HTML page and its associated resources, was a
sibling of the text/plain part within the
multipart/alternative container. The inverse arrangement,
with the multipart/alternative as the main document within
the multipart/related part, was something we have yet to
experiment with.
Here’s a picture of the structure of our initial attempts:
multipart/alternative
|
+-- text/plain
+-- multipart/related
|
+-- text/html
+-- image/gif
+-- text/css
This worked reasonably well in Thunderbird and Outlook 2002,
but we had consistent reports from our customer that the images and
stylesheet would randomly fail to display in Outlook 2003 (SP2). After
lots of mucking around trying to get Outlook to either work reliably
or fail reliably, we gave up on that line and instead simplified the
structure of our emails, putting the CSS styling inline in the HTML
HEAD element:
multipart/alternative
|
+-- text/plain
+-- multipart/related
|
+-- text/html (with text/css inline in HEAD)
+-- image/gif
This didn’t work particularly well, either: it seems many email
clients ignore styles set in the HEAD element. Finally, we
moved to applying CSS styling inline, using a style attribute
on each styled element. We were able to use an XSLT transformation to
allow us to write clean HTML and apply the CSS style
attribute automatically. The final structure of the emails we sent:
multipart/alternative
|
+-- text/plain
+-- multipart/related
|
+-- text/html (with text/css copied on to each element!)
+-- image/gif
This seems to work more-or-less reliably across
- Thunderbird
- Outlook 2002
- Outlook 2003 SP2
- Google Gmail
- MS Hotmail
If I was to do it all again, I’d give serious consideration to the
traditional non-multipart text/html solution with images
hosted by some public-facing web server. We managed to get our
multipart-HTML-emails working acceptably, but only by the skin of our
teeth.
References:
July 18th, 2006
tonyg
E4X is a new ECMA standard (ECMA-357)
specifying an extension to ECMAScript
for streamlining work with XML
documents.
It adds objects representing XML to ECMAScript, and extends the syntax
to allow literal XML fragments to appear in code. It also supports a
very XPath-like notation for
use in extracting data from XML objects. So far, so good - all these
things are somewhat useful. However, there are serious problems with
the design of the extension.
If E4X objects were real objects, if there were a means of splicing a
sequence of child nodes into XML literal syntax, and if E4X supported
XML namespace prefixes properly, most of my objections would be dealt
with. As it stands, the overall verdict is “clunky at best”.
These are my main complaints:
It doesn’t do anything like Scheme’s unquote-splicing,
and so using E4X to produce XML objects is verbose, error-prone and
dangerous in concurrent settings.
There seems to be no way of splicing in a sequence of items -
I’d like to do something like the following:
function buildItems() {
return [<item>Hello</item>,
<item>World!</item>];
}
var doc = <mydocument>{buildItems()}</mydocument>;
and have doc contain
<mydocument>
<item>Hello</item>
<item>World!</item>
</mydocument>
What actually results is the more-or-less useless
<mydocument>Hello,World!</mydocument>
The closest I can get to the result I’m after is
function buildItems(n) {
n.mydocument += <item>Hello</item>;
n.mydocument += <item>World!</item>;
}
var doc = <mydocument></mydocument>;
buildItems(doc);
It’s full of redundant redundancy - it’s as verbose as XML, when you
can do so
much better.
There’s no toXML() method (or similar) for use in
papering over the yawning chasm between the XML objects and the rest
of the language: you can’t even make a Javascript object able to
seamlessly render itself to XML.
The new types E4X introduces aren’t even proper objects -
they’re a whole new class of primitive datum!
Because they’re not proper objects, you can’t extend the system. You
ought to be able to implement to an interface and benefit from the
language’s XPath searching and filtering operations. E4X is so close
to offering a comprehension
facility for Javascript, but it’s been short-sightedly restricted to
a single class of non-extensible primitives.
You can’t even construct XML tags programmatically! If the name of
the tag doesn’t appear literally in your Javascript code, you’re out
of luck (unless you resort to eval…) [[Update: I was wrong about this - you can write <{expr}> and have the result of evaluating expr substituted into the tag.]]
E4X XML objects have no notion of namespace prefixes (which are
required for quality implementations of XPath and anything to do
with XML signatures). Prefixes only turn up in the API as a means of
producing (namespaceURI,localname) pairs. This is actually how it
should be, but because there’s already broken software out there
that depends on prefix support, by not supporting prefixes properly
you preclude ECMAScript+E4X from being used for XML signatures or
ECMAScript-native XPath implementations.
In my opinion, E4X violates several programming
language design principles: most importantly, those of
regularity, simplicity and orthogonality, but
also preservation of information, automation and
structure. SXML, perhaps in
combination with eager
comprehensions, provides a far superior model for producing and
consuming XML. Sadly, there’s no real alternative for ECMAScript yet -
we’re limited either to library extensions, or to using the DOM
without any syntactic or library support at all.
June 24th, 2006
tonyg
In Ant 1.6, properties can be written in XML files. Can someone tell me why
<property name="some.property" value="some.value"/>
is more desirable than
property.name=some.value
?
Update The import feature is what’s new in Ant 1.6 that makes this usage possible. So, the answer is, “because you can conditionally set properties in the imported files” (rather than conditionally evaluating property files). So really my gripe is (again) with the weird, stilted little language that is Ant.
September 22nd, 2005
mikeb
There’s a summary of an interesting discussion regarding semantics in HTML over on fantasai’s blog. Is the HR element only presentational or does it convey something about the content? It does seem to have a semantic role, but one which comes from it being a typographical convention of using a row of asterisks to mark context changes. The interesting bit is what comes after the discussion on HR (and SPAN and DIV); fantasai doesn’t say it explicitly, but Hixie’s proposed reforms to the HTML standard are in the same vein as with HR — they are clarifying what the element means by referring to what the typographical convention means. Compare the Web Applications 1.0 draft to the XHTML 2.0 draft.
August 8th, 2005
mikeb
At last there appears to be a working implementation of XForms that is written in Javascript and runs entirely inside a browser: FormFaces. The clear separation of content from presentation, and the declarative nature of XForms have always appealed to me, but the lack of a browser-side implementation has so far put me off from using it. Perhaps it’s time for another try?
July 22nd, 2005
matthias