Archive for July, 2006
Reading through tonyg’s recent post I came across something i haven’t yet seen in use - inline XML within Javascript code. E4X, it seems, has landed. It is now available by default in Firefox and Rhino - other implementation will surely follow.
E4X, shorthand for ECMAScript for XML is a nice language extension to Javascript adding native XML support. It adds XML types, a notation for literal XML and some basic operations. Previously, if you wanted to use XML in your Javascript code, you had two choices. Since XML has a textual representation, you could work with strings. This approach, however, is extremely error-prone, and is of limited use if you intend to do anything more sophisticted than just generating XML. The other approach is to use the XML DOM, which exposes the full power of XML using a consistent model, but is too verbose and so rather unpleasant to use.
Example: XML using strings / innerHTML
// Short, but notice how I forgot to close the paragraph
// Also, this is non-standard, and only works in HTML
myElement.innerHTML = '<p><b>Hello</b> <i>World</i>';
Example: XML using the DOM
// That must be one of the longest hello world
// examples I've ever written
var paragraph = document.createElement('p');
var bold = document.createElement('b');
var hello = document.createTextNode('Hello');
bold.appendChild(hello);
var italic = document.createElement('i');
var world = document.createTextNode('World');
italic.appendChild(world);
var space = document.createTextNode(' ');
paragraph.appendChild(bold);
paragraph.appendChild(space);
paragraph.appendChild(world);
myElement.appendhChild(paragraph);
As it happens, I am working on something that requires quite a lot of DOM manipulation within the browser, and tired of constructing XML using the DOM API I set to give the new E4X capabilities of Firefox 1.5 a try. The dissapointing reality, I soon found out, is that while E4X is very much present, it can’t be used for accessing or creating DOM elements. So if you plan on parsing some XML data, or generating XML from your program you can use E4X, but DOM manipulation, arguably the most important activity involving XML in a browser is not served by this new extension at all.
Example: How E4X could be used with the DOM
// This is structured XML, notice how there are no quotes
var p_xml = <p><b>Hello</b> <i>World</i><p>;
// But unfortunately you can't do that
var p_element = document.createElement(p_xml);
myElement.appendChild(p_element)
Javascript is a complete, general-purpose language, but in practice, it is being used exclusively as an extension for host environments. In Firefox, for example, it is used for adding program logic to the browser’s display formats - HTML, XUL and SVG. These formats can be expressed in text, but in order to manipulate them you need to access them using the DOM. For HTML, firefox adopted the nasty innerHTML non-standard extension, which allows the user to access the contents of a node as text. Fortunately, this extension doesn’t work with non-HTML elements. E4X could have been the perfect replacement - a compromise between using the dumb textual representation and the structured, but counter-intuitive DOM.
Why doesn’t Firefox provide a way to construct and manipulate DOM elements using E4X? It’s hard to blame the mozilla developers, given that the ECMA standard does not include any mention of the DOM or how to interact with it. Any extension they would have come up with would end being the next generation innerHTML non-standard.
This failure of the E4X standard, together with tonyg’s previous critique of E4X, as well as other rumours from the Javascript development arena have me wondering whether the standartisation efforts by ECMA have greatly benefited the language and its active community.
July 24th, 2006
Tom Berger
What’s the right way to create a subclass in JavaScript?
Wrong question, say the JavaScript advocates. JavaScript isn’t one of those fuddy-duddy old class-based languages. It’s something much more exciting: a prototype-based language! So remember, when you work with JavaScript, remember never to refer to “classes”, because JavaScript doesn’t have them, and it only shows you’re stuck in the old way of thinking.
I’m sure that these sentiments have done enormous harm to the reputations of real prototype-based languages, so let me banish it right here. JavaScript is not a prototype based language; it most closely resembles a class-based language, but all its mechanisms for doing the work of a class-based language are horribly broken, which is why its advocates try to pretend it’s something else.
Continue Reading July 24th, 2006
Paul Crowley
I’ve uploaded a snapshot of Icing,
including its dependent xml.pipeline library. We internally
use a piece of software called use.jar (David Ireland’s
Component Manager) to manage dependencies on external components, but
it’s not required for this interim release: you’ll need to have a Tomcat
v4.1.30 installation available instead. Newer versions of Tomcat
probably work, but are not guaranteed to.
Download icing-20060721.tar.gz,
and unpack it somewhere convenient. Installation instructions are in
the README file in the icing/icing-20060721
directory.
Do note that this is a snapshot release, and as such is full of rough
edges!
Our plans for Icing include splitting it into many small
loosely-coupled components, each published individually. As it stands,
it’s a little bit too tightly-coupled and framework-like for
comfort. We also want to fold in several minor and some fairly
substantial improvements that stem from the project work we’ve done
since we first identified Icing as a potentially-reusable piece of
software.
July 21st, 2006
tonyg
Haskell’s
QuickCheck is a very
neat tool for automated testing. One specifies properties that one
would like a program to satisfy, and generators for test data,
usually involving some form of randomisation. QuickCheck then uses the
generators to produce test cases and check the properties against
them.
The original QuickCheck was designed to test purely functional code
only. However, the project I am working on contains a fair amount of
imperative code, most of it performing operations on a database. Is it
possible to employ QuickCheck for testing this code?
Continue Reading July 20th, 2006
matthias
In one of our projects I needed to do some processing on data stored
in a PostgreSQL database. The data contains timestamps but the
processing requires time to be represented as seconds since the
epoch. What to do?
Generally, date&time processing is major headache. There are just way
too many opportunities to get things wrong. In particular, obtaining
the right result when the data has been passed through several layers
of conversion - database, database driver, o/r layer, programming
language - is fraught with difficulty. So I usually try to do the
conversions as close to the source as possible. In this instance that
means doing the conversion in the database. The quick solution is to
construct an appropriate SQL query that does the conversion. A better
idea though is to create a view. Here’s what I ended up with:
CREATE VIEW intervals AS
SELECT t.id as id,
t.user_id as user_id,
t.task_id as task_id,
CAST(EXTRACT(EPOCH FROM t.start_time AT TIME ZONE 'UTC')) as start_time,
CAST(EXTRACT(EPOCH FROM t.end_time AT TIME ZONE 'UTC') as end_time
FROM task_time t;
This works all very well as long as all we want to do is retrieve
data. What about updates? It turns out that the PostgreSQL rule system
allows us to make the above view behave like an ordinary table, with
insert, update and delete all working as expected. The
documentation
of this feature is excellent, and with its help it took me just a few
minutes to produce the following:
CREATE RULE intervals_ins AS ON INSERT TO intervals
DO INSTEAD
INSERT INTO task_time VALUES(
DEFAULT,
NEW.user_id,
NEW.task_id,
TIMESTAMP 'epoch' + NEW.start_time * INTERVAL '1 second',
TIMESTAMP 'epoch' + NEW.end_time * INTERVAL '1 second');
CREATE RULE intervals_upd AS ON UPDATE TO intervals
DO INSTEAD
UPDATE task_time
SET id = NEW.id,
user_id = NEW.user_id,
task_id = NEW.task_id,
start_time = TIMESTAMP 'epoch' + NEW.start_time * INTERVAL '1 second',
end_time = TIMESTAMP 'epoch' + NEW.end_time * INTERVAL '1 second'
WHERE id = OLD.id;
CREATE RULE intervals_del AS ON DELETE TO intervals
DO INSTEAD
DELETE FROM task_time
WHERE id = OLD.id;
With the above in place my code simply accesses the intervals table
for all operations that previously involved the task_time
table, and all time format conversions are done behind the scenes in
the database.
July 20th, 2006
matthias
A couple of weeks ago I picked up Chris Double’s server-side
javascript implementation, which uses the Mozilla Project’s Rhino Javascript environment
with Jetty to provide a
Javascript-controlled Java Servlet webserver.
The code’s available both for
browsing and for darcs download:
darcs get http://www.lshift.net/~tonyg/javascript-server/
After adding support for Jetty’s SessionHandler class to
Chris’s example.js, I downloaded the prototype.js Javascript utility
library and got it running in a server-side
environment1. The next step was using Rhino’s continuation
support to implement the equivalent of PLT Scheme’s send/suspend/dispatch
(also seen in Seaside,
under-the-covers as part of the HTML-rendering and workflow aspects of
the system, and in SISCWeb, which is at the
core of our Icing
library).
Here’s a little workflow, roughly equivalent to Seaside’s Counter
application:
sv.addEntryPoint
("/count", // [1]
function (servlet, bindings) {
var finalC = servlet.withState
(10, // [2]
function (c) { // [3]
while ( // [4]
servlet.sendAndDispatch
(function (embedUrl) { // [5]
servlet.replyHtml
(doc(”Counter”,
<>
<p>{c.value}</p>
<p>
<a href={embedUrl(function(){
c.value++; return true})}
>More</a>;
<a href={embedUrl(function(){
c.value–; return true})}
>Less</a>;
<a href={embedUrl(function(){
return false})}
>Stop</a>;
</p>
</>));
}))
{
// Nothing to do in the body of the loop.
}
return c.value; // [6]
});
servlet.replyHtml(doc(”Bye!”, <p>Bye! {finalC}</p>));
// [7]
});
Points of interest:
- [1] is where we specify the URL path to this workflow.
- [2] and [3] are about preserving state across use of the
back button, about which more below.
- [4] is the point at which control will resume when the user clicks
on any of the links produced by the embedUrl argument to the
function given to sendAndDispatch.
- [5] is the function for producing a document for the user containing
links (generated by embedUrl) that cause the workflow to
resume at [4].
- [6] is the point at which one of the embedded link-handlers in [5]
has returned false to [4], causing the while-loop to
terminate. At this point the state held in c is extracted and
the stateful part of the workflow is over.
- [7] is where the workflow finally ends, because the final document
sent to the user wasn’t sent from within sendAndDispatch and
didn’t contain any embedded links to a continuation.
Javascript is a little like Scheme - but not enough like Scheme to
avoid the pitfalls of using ordinary local variables in a web
workflow. The problem is that there are two ways you might want local
variables to behave as the user back-and-forwards around your
workflow, both perfectly reasonable and appropriate at different
times:
the contents of variables could be unshared across stages of the
workflow, so that backing up and proceeding again from an earlier
point can run without being affected by any of the decisions the user
has used the back button to, in effect, undo; and
the contents could be shared across stages of the workflow, so that
the user feels like he or she is affecting some real state in the
server, and so that the different pages in the workflow appear to all
be affecting this separate real object.
The first option seems to me more functional in style, and the second
more object-oriented.
To produce the second, object-oriented effect using these Javascript
servlets, simply declare variables as Javascript locals and assign to
them. The first is trickier: all variables are mutable, and there’s no
pleasant syntax for functional-style rebinding of variables, so I’ve
resorted to the withState method seen in the example above.
The basic idea is that we should reify functional variables (since
they’re the exception rather than the rule in Javascript; in Scheme,
we’d probably reify the mutable ones!) and use a system very much like
Scheme’s dynamic-wind
to make sure the correct values are visible at each stage in the
workflow. Here’s a more focussed example of withState usage:
var finalResult =
servlet.withState(initialValue,
function (stateCell) {
// … code using stateCell …
return finalValue;
});
The initialValue gets placed into a fresh managed cell, which
is bound to stateCell for the duration of the function. The
code in the function should access and modify
stateCell.value, and the values will be tracked automatically
across the forward and back buttons. The final result of the function
is used as the final result of the whole withState call. Once
withState returns, stateCell is no longer
automatically tracked - it has gone out of scope, in a way.
Footnote 1: Tricks were required to get prototype.js running - but
they were as simple as defining document = {}; window = {};
navigator = {};.
July 18th, 2006
tonyg
For a recent project, we developed support for sending
automatically-generated HTML emails. Now, most people do this by
including a message body with MIME-type
text/html. For extra points, sometimes there’s also a
text/plain part alongside the HTML in a
multipart/alternative container.
The problem with doing things this way is that you can’t include any
images or other resources (such as CSS) as separate parts of the email
linked to from the main HTML body-part. For that, you need to use the
multipart/related
MIME-type. Unfortunately, few commonly-used email clients render
multipart/related HTML-plus-resource aggregations well.
We only tried the arrangement where the multipart/related,
containing the main HTML page and its associated resources, was a
sibling of the text/plain part within the
multipart/alternative container. The inverse arrangement,
with the multipart/alternative as the main document within
the multipart/related part, was something we have yet to
experiment with.
Here’s a picture of the structure of our initial attempts:
multipart/alternative
|
+-- text/plain
+-- multipart/related
|
+-- text/html
+-- image/gif
+-- text/css
This worked reasonably well in Thunderbird and Outlook 2002,
but we had consistent reports from our customer that the images and
stylesheet would randomly fail to display in Outlook 2003 (SP2). After
lots of mucking around trying to get Outlook to either work reliably
or fail reliably, we gave up on that line and instead simplified the
structure of our emails, putting the CSS styling inline in the HTML
HEAD element:
multipart/alternative
|
+-- text/plain
+-- multipart/related
|
+-- text/html (with text/css inline in HEAD)
+-- image/gif
This didn’t work particularly well, either: it seems many email
clients ignore styles set in the HEAD element. Finally, we
moved to applying CSS styling inline, using a style attribute
on each styled element. We were able to use an XSLT transformation to
allow us to write clean HTML and apply the CSS style
attribute automatically. The final structure of the emails we sent:
multipart/alternative
|
+-- text/plain
+-- multipart/related
|
+-- text/html (with text/css copied on to each element!)
+-- image/gif
This seems to work more-or-less reliably across
- Thunderbird
- Outlook 2002
- Outlook 2003 SP2
- Google Gmail
- MS Hotmail
If I was to do it all again, I’d give serious consideration to the
traditional non-multipart text/html solution with images
hosted by some public-facing web server. We managed to get our
multipart-HTML-emails working acceptably, but only by the skin of our
teeth.
References:
July 18th, 2006
tonyg
We wanted to add a ’search this site’ function to a client’s website
but did not have the time to study the 200+ existing ways of doing
this. Perhaps using the
“Microsoft Indexing Service” (or “Index Server”, IS), which fits well
with the software running the existing site (IIS), can easily be
extended to search within MS Office and PDF documents?
But there is a problem with using IS for this: IS can only index files
on a local or remote file system, it does not crawl a website.
In our case that is not good enough because the content lives in a
database, and we have to follow links like http://mysite.com?page=42.
Moreover, we wanted to make sure exactly the content exported through
HTTP is indexed, no more no less.
The solution we came up with works like this:
- Use a standard webcrawler to download a
copy of the site through HTTP and store it the local filesystem of
the server.
- Use Indexing Service to index the local copy of the site.
- Use a small hashtable for mapping the filenames returned by a
query back into URLs.
This cleanly separates the webcrawl and the indexing, and the search is
entirely ignorant about the (possibly heterogeneous and complicated)
software architecture of the site.
So far it is just a prototype, but it seems to work fine.
July 17th, 2006
sebastian
It’s the usual story: there’ve been other demands on my time (projects
here at LShift, among many other things), and so the release of Icing
has suffered.
The good news is that I’ve been given some time to package it up and
make it available, and barring unexpected interruptions, I ought to
have something presentable by the end of the week.
July 17th, 2006
tonyg
Whilst projects like Xen and new hardware extensions to CPUs from Intel and AMD allow multiple OSes to run on the same machine at the same time, for me, there are currently few cases where I need this. I work under Linux and all I need is virtualisation to run multiple Linuxes at the same time. Also, virtualisation at the level of Xen requires that you set harddisc space and RAM for each running OS instance: the instances don’t share resources very well.
Linux VServer is virtualisation at a different level: there is only one Linux kernel ever running, but a chroot-on-steroids-like system ensures that you can start up multiple instances of linux and they do not interfere with each other in anyway possible. However, because it’s only one kernel running, the multiple instances do share resources such as RAM and harddisc partitions much more effectively. Having got some vservers up and running, they can be cloned, moved between machines, started and stopped easily and generally be manipulated very easily. You can do private networking between your vservers, and you can even get X up and running inside a vserver.
Once this is all up and running, it makes migration between different services very much easier. For example, last week I upgraded our bugzilla installation. In the past I’ve tended to upgrade our main installation in place which has been a bad idea in several cases. So this time, I copied our current installation onto a clean vserver and checked it worked. I then cloned that vserver and performed the upgrade on the clone, then fixing everything that broke. This meant that at all points I had a working copy of the original installation to refer to and that I could make sure I got the upgraded version at least to the same level of functionality as the old version before rolling it out on top of our main installation. The result was that I knew in advance all the “gotchas” of the upgrade before doing the upgrade on the main installation and consequently it went very smoothly. Almost as important is that as the upgrade is now complete, I can quite happily delete the bugzilla vservers as they’re no longer needed: because of the total separation of the vservers, this is very easy (much easier that trying to uninstall packages and delete databases) and it means that if you use vservers, you never have your main working environment polluted by the software you are working on.
It rather looks like I’ll be putting vserver on every machine I install from now on…
July 17th, 2006
matthew
Previous Posts