Archive for July, 2006

E4X and the DOM

Reading through tonyg’s recent post I came across something i haven’t yet seen in use - inline XML within Javascript code. E4X, it seems, has landed. It is now available by default in Firefox and Rhino - other implementation will surely follow.

E4X, shorthand for ECMAScript for XML is a nice language extension to Javascript adding native XML support. It adds XML types, a notation for literal XML and some basic operations. Previously, if you wanted to use XML in your Javascript code, you had two choices. Since XML has a textual representation, you could work with strings. This approach, however, is extremely error-prone, and is of limited use if you intend to do anything more sophisticted than just generating XML. The other approach is to use the XML DOM, which exposes the full power of XML using a consistent model, but is too verbose and so rather unpleasant to use.

Example: XML using strings / innerHTML

// Short, but notice how I forgot to close the paragraph
// Also, this is non-standard, and only works in HTML
myElement.innerHTML = '<p><b>Hello</b> <i>World</i>';

Example: XML using the DOM

// That must be one of the longest hello world
// examples I've ever written
var paragraph = document.createElement('p');
var bold = document.createElement('b');
var hello = document.createTextNode('Hello');
bold.appendChild(hello);
var italic = document.createElement('i');
var world = document.createTextNode('World');
italic.appendChild(world);
var space = document.createTextNode(' ');
paragraph.appendChild(bold);
paragraph.appendChild(space);
paragraph.appendChild(world);
myElement.appendhChild(paragraph);

As it happens, I am working on something that requires quite a lot of DOM manipulation within the browser, and tired of constructing XML using the DOM API I set to give the new E4X capabilities of Firefox 1.5 a try. The dissapointing reality, I soon found out, is that while E4X is very much present, it can’t be used for accessing or creating DOM elements. So if you plan on parsing some XML data, or generating XML from your program you can use E4X, but DOM manipulation, arguably the most important activity involving XML in a browser is not served by this new extension at all.

Example: How E4X could be used with the DOM

// This is structured XML, notice how there are no quotes
var p_xml =  <p><b>Hello</b> <i>World</i><p>;
// But unfortunately you can't do that
var p_element = document.createElement(p_xml);
myElement.appendChild(p_element)

Javascript is a complete, general-purpose language, but in practice, it is being used exclusively as an extension for host environments. In Firefox, for example, it is used for adding program logic to the browser’s display formats - HTML, XUL and SVG. These formats can be expressed in text, but in order to manipulate them you need to access them using the DOM. For HTML, firefox adopted the nasty innerHTML non-standard extension, which allows the user to access the contents of a node as text. Fortunately, this extension doesn’t work with non-HTML elements. E4X could have been the perfect replacement - a compromise between using the dumb textual representation and the structured, but counter-intuitive DOM.

Why doesn’t Firefox provide a way to construct and manipulate DOM elements using E4X? It’s hard to blame the mozilla developers, given that the ECMA standard does not include any mention of the DOM or how to interact with it. Any extension they would have come up with would end being the next generation innerHTML non-standard.

This failure of the E4X standard, together with tonyg’s previous critique of E4X, as well as other rumours from the Javascript development arena have me wondering whether the standartisation efforts by ECMA have greatly benefited the language and its active community.

1 comment July 24th, 2006 Tom Berger

Subclassing in JavaScript, part 1

What’s the right way to create a subclass in JavaScript?

Wrong question, say the JavaScript advocates. JavaScript isn’t one of those fuddy-duddy old class-based languages. It’s something much more exciting: a prototype-based language! So remember, when you work with JavaScript, remember never to refer to “classes”, because JavaScript doesn’t have them, and it only shows you’re stuck in the old way of thinking.

I’m sure that these sentiments have done enormous harm to the reputations of real prototype-based languages, so let me banish it right here. JavaScript is not a prototype based language; it most closely resembles a class-based language, but all its mechanisms for doing the work of a class-based language are horribly broken, which is why its advocates try to pretend it’s something else.

Continue Reading 9 comments July 24th, 2006 Paul Crowley

Icing snapshot 20060721

I’ve uploaded a snapshot of Icing, including its dependent xml.pipeline library. We internally use a piece of software called use.jar (David Ireland’s Component Manager) to manage dependencies on external components, but it’s not required for this interim release: you’ll need to have a Tomcat v4.1.30 installation available instead. Newer versions of Tomcat probably work, but are not guaranteed to.

Download icing-20060721.tar.gz, and unpack it somewhere convenient. Installation instructions are in the README file in the icing/icing-20060721 directory.

Do note that this is a snapshot release, and as such is full of rough edges!

Our plans for Icing include splitting it into many small loosely-coupled components, each published individually. As it stands, it’s a little bit too tightly-coupled and framework-like for comfort. We also want to fold in several minor and some fairly substantial improvements that stem from the project work we’ve done since we first identified Icing as a potentially-reusable piece of software.

Add comment July 21st, 2006 tonyg

QuickChecking imperative code

Haskell’s QuickCheck is a very neat tool for automated testing. One specifies properties that one would like a program to satisfy, and generators for test data, usually involving some form of randomisation. QuickCheck then uses the generators to produce test cases and check the properties against them.

The original QuickCheck was designed to test purely functional code only. However, the project I am working on contains a fair amount of imperative code, most of it performing operations on a database. Is it possible to employ QuickCheck for testing this code?

Continue Reading Add comment July 20th, 2006 matthias

Updateable views in PostgreSQL

In one of our projects I needed to do some processing on data stored in a PostgreSQL database. The data contains timestamps but the processing requires time to be represented as seconds since the epoch. What to do?

Generally, date&time processing is major headache. There are just way too many opportunities to get things wrong. In particular, obtaining the right result when the data has been passed through several layers of conversion - database, database driver, o/r layer, programming language - is fraught with difficulty. So I usually try to do the conversions as close to the source as possible. In this instance that means doing the conversion in the database. The quick solution is to construct an appropriate SQL query that does the conversion. A better idea though is to create a view. Here’s what I ended up with:

CREATE VIEW intervals AS
    SELECT t.id as id,
           t.user_id as user_id,
           t.task_id as task_id,
           CAST(EXTRACT(EPOCH FROM t.start_time AT TIME ZONE 'UTC')) as start_time,
           CAST(EXTRACT(EPOCH FROM t.end_time AT TIME ZONE 'UTC') as end_time
    FROM task_time t;

This works all very well as long as all we want to do is retrieve data. What about updates? It turns out that the PostgreSQL rule system allows us to make the above view behave like an ordinary table, with insert, update and delete all working as expected. The documentation of this feature is excellent, and with its help it took me just a few minutes to produce the following:

CREATE RULE intervals_ins AS ON INSERT TO intervals
    DO INSTEAD
    INSERT INTO task_time VALUES(
        DEFAULT,
        NEW.user_id,
        NEW.task_id,
        TIMESTAMP 'epoch' + NEW.start_time * INTERVAL '1 second',
        TIMESTAMP 'epoch' + NEW.end_time * INTERVAL '1 second');

CREATE RULE intervals_upd AS ON UPDATE TO intervals
    DO INSTEAD
    UPDATE task_time
       SET id = NEW.id,
           user_id = NEW.user_id,
           task_id = NEW.task_id,
           start_time = TIMESTAMP 'epoch' + NEW.start_time * INTERVAL '1 second',
           end_time = TIMESTAMP 'epoch' + NEW.end_time * INTERVAL '1 second'
    WHERE id = OLD.id;

CREATE RULE intervals_del AS ON DELETE TO intervals
    DO INSTEAD
    DELETE FROM task_time
    WHERE id = OLD.id;

With the above in place my code simply accesses the intervals table for all operations that previously involved the task_time table, and all time format conversions are done behind the scenes in the database.

Add comment July 20th, 2006 matthias

A Rhino at the Seaside?

A couple of weeks ago I picked up Chris Double’s server-side javascript implementation, which uses the Mozilla Project’s Rhino Javascript environment with Jetty to provide a Javascript-controlled Java Servlet webserver.

The code’s available both for browsing and for darcs download:

darcs get http://www.lshift.net/~tonyg/javascript-server/

After adding support for Jetty’s SessionHandler class to Chris’s example.js, I downloaded the prototype.js Javascript utility library and got it running in a server-side environment1. The next step was using Rhino’s continuation support to implement the equivalent of PLT Scheme’s send/suspend/dispatch (also seen in Seaside, under-the-covers as part of the HTML-rendering and workflow aspects of the system, and in SISCWeb, which is at the core of our Icing library).

Here’s a little workflow, roughly equivalent to Seaside’s Counter application:

sv.addEntryPoint
("/count", // [1]
 function (servlet, bindings) {
     var finalC = servlet.withState
     (10, // [2]
      function (c) { // [3]
          while ( // [4]
                 servlet.sendAndDispatch
                 (function (embedUrl) { // [5]
                      servlet.replyHtml
                      (doc(”Counter”,
                           <>
                             <p>{c.value}</p>
                             <p>
                               <a href={embedUrl(function(){
                                        c.value++; return true})}
                                >More</a>;
                               <a href={embedUrl(function(){
                                        c.value–; return true})}
                                >Less</a>;
                               <a href={embedUrl(function(){
                                        return false})}
                                >Stop</a>;
                             </p>
                           </>));
                  }))
          {
            // Nothing to do in the body of the loop.
          }
          return c.value; // [6]
      });
     servlet.replyHtml(doc(”Bye!”, <p>Bye! {finalC}</p>));
     // [7]
 });

Points of interest:

  • [1] is where we specify the URL path to this workflow.
  • [2] and [3] are about preserving state across use of the back button, about which more below.
  • [4] is the point at which control will resume when the user clicks on any of the links produced by the embedUrl argument to the function given to sendAndDispatch.
  • [5] is the function for producing a document for the user containing links (generated by embedUrl) that cause the workflow to resume at [4].
  • [6] is the point at which one of the embedded link-handlers in [5] has returned false to [4], causing the while-loop to terminate. At this point the state held in c is extracted and the stateful part of the workflow is over.
  • [7] is where the workflow finally ends, because the final document sent to the user wasn’t sent from within sendAndDispatch and didn’t contain any embedded links to a continuation.

Javascript is a little like Scheme - but not enough like Scheme to avoid the pitfalls of using ordinary local variables in a web workflow. The problem is that there are two ways you might want local variables to behave as the user back-and-forwards around your workflow, both perfectly reasonable and appropriate at different times:

  • the contents of variables could be unshared across stages of the workflow, so that backing up and proceeding again from an earlier point can run without being affected by any of the decisions the user has used the back button to, in effect, undo; and

  • the contents could be shared across stages of the workflow, so that the user feels like he or she is affecting some real state in the server, and so that the different pages in the workflow appear to all be affecting this separate real object.

The first option seems to me more functional in style, and the second more object-oriented.

To produce the second, object-oriented effect using these Javascript servlets, simply declare variables as Javascript locals and assign to them. The first is trickier: all variables are mutable, and there’s no pleasant syntax for functional-style rebinding of variables, so I’ve resorted to the withState method seen in the example above.

The basic idea is that we should reify functional variables (since they’re the exception rather than the rule in Javascript; in Scheme, we’d probably reify the mutable ones!) and use a system very much like Scheme’s dynamic-wind to make sure the correct values are visible at each stage in the workflow. Here’s a more focussed example of withState usage:

var finalResult =
  servlet.withState(initialValue,
                    function (stateCell) {
                      // … code using stateCell …
                      return finalValue;
                    });

The initialValue gets placed into a fresh managed cell, which is bound to stateCell for the duration of the function. The code in the function should access and modify stateCell.value, and the values will be tracked automatically across the forward and back buttons. The final result of the function is used as the final result of the whole withState call. Once withState returns, stateCell is no longer automatically tracked - it has gone out of scope, in a way.


Footnote 1: Tricks were required to get prototype.js running - but they were as simple as defining document = {}; window = {}; navigator = {};.

1 comment July 18th, 2006 tonyg

HTML Email is hard to get right

For a recent project, we developed support for sending automatically-generated HTML emails. Now, most people do this by including a message body with MIME-type text/html. For extra points, sometimes there’s also a text/plain part alongside the HTML in a multipart/alternative container.

The problem with doing things this way is that you can’t include any images or other resources (such as CSS) as separate parts of the email linked to from the main HTML body-part. For that, you need to use the multipart/related MIME-type. Unfortunately, few commonly-used email clients render multipart/related HTML-plus-resource aggregations well.

We only tried the arrangement where the multipart/related, containing the main HTML page and its associated resources, was a sibling of the text/plain part within the multipart/alternative container. The inverse arrangement, with the multipart/alternative as the main document within the multipart/related part, was something we have yet to experiment with.

Here’s a picture of the structure of our initial attempts:

multipart/alternative
 |
 +-- text/plain
 +-- multipart/related
      |
      +-- text/html
      +-- image/gif
      +-- text/css

This worked reasonably well in Thunderbird and Outlook 2002, but we had consistent reports from our customer that the images and stylesheet would randomly fail to display in Outlook 2003 (SP2). After lots of mucking around trying to get Outlook to either work reliably or fail reliably, we gave up on that line and instead simplified the structure of our emails, putting the CSS styling inline in the HTML HEAD element:

multipart/alternative
 |
 +-- text/plain
 +-- multipart/related
      |
      +-- text/html (with text/css inline in HEAD)
      +-- image/gif

This didn’t work particularly well, either: it seems many email clients ignore styles set in the HEAD element. Finally, we moved to applying CSS styling inline, using a style attribute on each styled element. We were able to use an XSLT transformation to allow us to write clean HTML and apply the CSS style attribute automatically. The final structure of the emails we sent:

multipart/alternative
 |
 +-- text/plain
 +-- multipart/related
      |
      +-- text/html (with text/css copied on to each element!)
      +-- image/gif

This seems to work more-or-less reliably across

  • Thunderbird
  • Outlook 2002
  • Outlook 2003 SP2
  • Google Gmail
  • MS Hotmail

If I was to do it all again, I’d give serious consideration to the traditional non-multipart text/html solution with images hosted by some public-facing web server. We managed to get our multipart-HTML-emails working acceptably, but only by the skin of our teeth.

References:

4 comments July 18th, 2006 tonyg

How to provide ’search this site’ functionality?

We wanted to add a ’search this site’ function to a client’s website but did not have the time to study the 200+ existing ways of doing this. Perhaps using the “Microsoft Indexing Service” (or “Index Server”, IS), which fits well with the software running the existing site (IIS), can easily be extended to search within MS Office and PDF documents?

But there is a problem with using IS for this: IS can only index files on a local or remote file system, it does not crawl a website. In our case that is not good enough because the content lives in a database, and we have to follow links like http://mysite.com?page=42. Moreover, we wanted to make sure exactly the content exported through HTTP is indexed, no more no less.

The solution we came up with works like this:

  1. Use a standard webcrawler to download a copy of the site through HTTP and store it the local filesystem of the server.
  2. Use Indexing Service to index the local copy of the site.
  3. Use a small hashtable for mapping the filenames returned by a query back into URLs.

This cleanly separates the webcrawl and the indexing, and the search is entirely ignorant about the (possibly heterogeneous and complicated) software architecture of the site.

So far it is just a prototype, but it seems to work fine.

2 comments July 17th, 2006 sebastian

Why there’s been no news regarding Icing

It’s the usual story: there’ve been other demands on my time (projects here at LShift, among many other things), and so the release of Icing has suffered.

The good news is that I’ve been given some time to package it up and make it available, and barring unexpected interruptions, I ought to have something presentable by the end of the week.

Add comment July 17th, 2006 tonyg

Linux VServer: Cheap and Easy Virtualisation

Whilst projects like Xen and new hardware extensions to CPUs from Intel and AMD allow multiple OSes to run on the same machine at the same time, for me, there are currently few cases where I need this. I work under Linux and all I need is virtualisation to run multiple Linuxes at the same time. Also, virtualisation at the level of Xen requires that you set harddisc space and RAM for each running OS instance: the instances don’t share resources very well.

Linux VServer is virtualisation at a different level: there is only one Linux kernel ever running, but a chroot-on-steroids-like system ensures that you can start up multiple instances of linux and they do not interfere with each other in anyway possible. However, because it’s only one kernel running, the multiple instances do share resources such as RAM and harddisc partitions much more effectively. Having got some vservers up and running, they can be cloned, moved between machines, started and stopped easily and generally be manipulated very easily. You can do private networking between your vservers, and you can even get X up and running inside a vserver.

Once this is all up and running, it makes migration between different services very much easier. For example, last week I upgraded our bugzilla installation. In the past I’ve tended to upgrade our main installation in place which has been a bad idea in several cases. So this time, I copied our current installation onto a clean vserver and checked it worked. I then cloned that vserver and performed the upgrade on the clone, then fixing everything that broke. This meant that at all points I had a working copy of the original installation to refer to and that I could make sure I got the upgraded version at least to the same level of functionality as the old version before rolling it out on top of our main installation. The result was that I knew in advance all the “gotchas” of the upgrade before doing the upgrade on the main installation and consequently it went very smoothly. Almost as important is that as the upgrade is now complete, I can quite happily delete the bugzilla vservers as they’re no longer needed: because of the total separation of the vservers, this is very easy (much easier that trying to uninstall packages and delete databases) and it means that if you use vservers, you never have your main working environment polluted by the software you are working on.

It rather looks like I’ll be putting vserver on every machine I install from now on…

Add comment July 17th, 2006 matthew

Previous Posts

Calendar

July 2006
M T W T F S S
« Jun   Aug »
 12
3456789
10111213141516
17181920212223
24252627282930
31  

Posts by Month

Posts by Category