technology from back to front

Rant: Yahoo doesn’t know what an email address is

Many websites refuse to accept email addresses of the form myusername+sometext@gmail.com, despite the fact that the +sometext is perfectly legitimate1 and is an advertised feature gmail offers for creating pseudo-single-use email addresses from a base email address.

My guess is that the developers of these sites think, because they’re either lazy or incompetent, that email addresses have more restrictions than they in fact have. It’s reasonable (and fairly easy) these days to check the syntax of the DNS part of an email address, because few people use non-DNS or non-SMTP transfer methods anymore, but the mailbox part is extremely flexible and hard to check accurately. A sane thing to do is just trust the user, and send a test mail to validate the address.

I picked on Yahoo in the title of this post: Yahoo are by no means the only offender, but I just signed up for a yahoo account, so they’re for me the most recent. Their signup form also refused to provide any guidance about why they were rejecting the form submission: I had to use my previous experience of sites wrongly rejecting valid email addresses to guess what the problem might be. Fail.


Footnote 1: According to my best reading of the relevant RFCs, anyway. See the definition of dot-atom in section 3.2.4 of RFC 2822, referenced in this context by section 3.4.1.

by
tonyg
on
17/03/09

Rant: Adventures with the Fisher Price My First Firewall

I’m writing this blog entry for therapeutic reasons. Everything you need to know is in the link below. Readers are invited to share the worst anti-features they have found in network devices by posting a comment.

I had a strange problem sending email from a host. I first discovered that trac couldn’t send messages via a remote smtp server. It would just hang indefinitely. So I decided it was better to set up exim on the local box, and have trac send mail using that - at least it wouldn’t hang.

Unfortunately, exim wouldn’t send messages either.

At this stage, we were using the same smtp server - exim was configured to use it as a smart host.

We discounted any firewall problems immediately, because we could establish a connection. We didn’t immediately notice that we didn’t get an initial message from the server. When we did, we assumed it was because the server wasn’t sending it for some reason, and started checking on things like DNS.

This got us nowhere.

Then I noticed that if I typed HELO into the connection I did get a response. Eventually I noticed I could type anything into the connection, and get the initial 220 back from exim.

At this point, I decided I would use tshark to check on what the smtp server was doing, and discovered that actually, it was sending the 220, and resending it a good few times too, it just never turned up at the end.

This turned my attention to the Zyxel firewall we were using.

It turns out that a ‘feature’ of the firewall designed to prevent spam prevented as receiving anything from the server on the connection until we had sent something on the connection. This feature is particularly ridiculous, since most spam mail clients don’t bother to try and synchronize with the server, so only spam would get through while legitimate clients would not.

We gather a firmware upgrade has solved this problem, but letting a firewall release into the wild without checking you could send email through it is a spectacular screw up - enough to convince me never to buy from this brand again, anyway.

Thanks Simon, for dubbing this product the ‘Fisher Price My First Firewall’.

Thanks Lucas Beeler for blogging about it here.

Thanks Zyxel for wrecking my day.

by
david
on
10/09/08

Rant: Smalltalk vs. Javascript; Diff and Diff3 for Squeak Smalltalk

Many of my recent posts here have discussed the diff and diff3 code I wrote in Javascript. A couple of weekends ago I sat down and translated the code into Squeak Smalltalk. The experience of writing the “same code” for the two different environments let me compare them fairly directly.

To sum up, Smalltalk was much more pleasant than working with Javascript, and produced higher-quality code (in my opinion) in less time. It was nice to be reminded that there are some programming languages and environments that are actually pleasant to use.

The biggest win was Smalltalk’s collection objects. Where stock Javascript limits you to the non-polymorphic

for (var index = 0; index < someArray.length; index++) {
  var item = someArray[index];
  /* do something with item, and/or index */
}

Smalltalk permits

someCollection do: [:item | "do something with item"].

or, alternatively

someCollection withIndexDo:
    [:item :index | "do something with item and index"].

Smalltalk collections are properly object-oriented, meaning that the code above is fully polymorphic. The Javascript equivalent only works with the built-in, not-even-proper-object Arrays.

Of course, I could use one of the many, many, many, many Javascript support libraries that are out there; the nice thing about Smalltalk is that I don’t have to find and configure an ill-fitting third-party bolt-on collections library, and that because the standard library is simple yet rich, I don’t have to worry about potential incompatibilities between third-party libraries, such as can occur in Javascript if you’re mixing and matching code from several sources.

Other points that occurred to me as I was working:

  • Smalltalk has simple, sane syntax; Javascript… doesn’t. (The number of times I get caught out by the semantics of this alone…!)
  • Smalltalk has simple, sane scoping rules; Javascript doesn’t. (O, for lexical scope!)
  • Smalltalk’s uniform, integrated development tools (including automated refactorings and an excellent object explorer) helped keep the code clean and object-oriented.
  • The built-in SUnit test runner let me develop unit tests alongside the code.

The end result of a couple of hours’ hacking is an implementation of Hunt-McIlroy text diff (that works over arbitrary SequenceableCollections, and has room for alternative diff implementations) and a diff3 merge engine, with a few unit tests. You can read a fileout of the code, or use Monticello to load the DiffMerge module from my public Monticello repository. [Update: Use the DiffMerge Monticello repository on SqueakSource.]

If Monticello didn’t already exist, it’d be a very straightforward matter indeed to build a DVCS for Smalltalk from here. I wonder if Spoon could use something along these lines?

It also occurred to me it’d be a great thing to use OMeta/JS to support the use of

<script type="text/smalltalk">"<![CDATA["
  (document getElementById: 'someId') innerHTML: '<p>Hello, world!</p>'
"]]>”</script>

by compiling it to Javascript at load-time (or off-line). Smalltalk would make a much better language for AJAX client-side programming.

by
tonyg
on
01/07/08

Rant: E4X: Not as awful as I thought

Long, long ago, I complained about various warts and infelicities in E4X, the ECMAScript extensions for generating and pattern-matching XML documents. It turns out that two of my complaints were not well-founded: sequence-splicing is supported, and programmatic construction of tags is possible.

Firstly (and I’m amazed I didn’t realise this at the time, as I was using it elsewhere), it’s not a problem at all to splice in a sequence of items, in the manner of Scheme’s unquote-splicing; here’s a working solution to the problem I set myself:

function buildItems() {
  return <>
           <item>Hello</item>
           <item>World!</item>
         </>;
}
var doc = <mydocument>{buildItems()}</mydocument>;

You can even use real Arrays (which is what I tried and failed to do earlier), by guerilla-patching Array.prototype:

Array.prototype.toXMLList = function () {
    var x = <container/>;
    for (var i = 0; i < this.length; i++) {
        x.appendChild(this[i]);
    }
    return x.children();
}
function buildItems() {
    return [<item>Hello</item>,
            <item>World!</item>].toXMLList();
}
var doc = <mydocument>{buildItems()}</mydocument>;

Programmatic construction of tags is done by use of the syntax for plain old unquote, in an unusual position: inside the tag’s angle-brackets:

var tagName = "p";
var doc = <{tagName}>test</{tagName}>;

So in summary, my original expectation that E4X should turn out to be very quasiquote-like wasn’t so far off the mark. It’s enough to get the basics done (ignoring for the minute the problems with namespace prefixes), but it’s still a bit of a bolt-on afterthought; it would have been nice to see it better integrated with the rest of the language.

by
tonyg
on
07/05/08

Rant: .NET is an endless supply of fascinating puzzles

In C, size_t is unsigned. In Java, there are no unsigned fixed-width pseudointegral types, so it can perhaps be forgiven for having an array’s length field be signed. In .NET, however, which has unsigned ints, an array’s length field is also signed. What could it possibly mean to have a length less than zero?

by
tonyg
on
19/09/07

Rant: Closing over context still not easy in mainstream languages, Film at 11

I find it fascinating that after so many decades of support for closures, we’re still stuck in a C-style mentality of passing function-pointers that take an explicit context argument rather than a proper closure object. Witness the design of .NET’s Type.FindInterfaces method:

public virtual Type[] FindInterfaces (TypeFilter filter,
                                      Object filterCriteria);

The TypeFilter argument is a delegate. The Object argument is context that the delegate may require! This is pretty much exactly the old-school C-style way of implementing closures:

/* Yes, pretty crude translation, I know */
TypeArray find_interfaces(int (*type_filter)(Type*, void*),
                          void *argument);

Smalltalk (and Lisps) would do it in the natural way, with a block (a closure):

someType selectInterfaces: [:interface | ... ]

Lisp 1.5, complete with support for lexical closures, appeared in 1959. It’s 2007. That’s forty-eight years.

by
tonyg
on
11/09/07

Rant: No CV?

Although we’ve been very pleased to welcome Felix and Simon in the last few months, and we’re very happy about the return of Sam Jones, we’re still on the lookout for fresh blood.

We really haven’t had much luck with recruiting strangers recently. Simon and Felix both came to us via personal introductions.

We’ve been sifting sadly though the CVs sent by recruiters. There have been a few exceptions, but these nearly always fail to raise a flutter of interest. Typically there’s no sense of the person behind the application, just a very narrow range of experience and interest. We really rely on the slow but steady trickle of CVs we get from people who actually know of LShift, and have perhaps even looked at this website and have actively chosen to apply. This kind of application has a much better chance of success here. We’ve had a few close runs in the past few months. Unfortunately, one of the interesting applicants just disappeared into the ether mid-way through the recruitment process (we’re not that bad surely?). We met someone else we’d hire tomorrow - sadly for us it turned out not to be such good timing for the applicant (yet, i hope). We’ve just heard from another interesting prospect too, but we’d still love to see more applicants like these, and a little bit more luck.

If you know open source and commercial projects, if you’re not afraid of computer science and you know how to complete a project then please cheer us up and get in touch. If you fit the bill but the job described here doesn’t appeal then we’d love to know why.

Also, we’d be very interested to know where all the CVs from women in this industry go - we haven’t seen one for ages.

by
sophie
on
26/07/07

Rant: Why does everything on the web require registration?

Some sites or services, quite reasonably, need to know who I am (and that I really am that person, to some acceptable level of verifiability). It’s usually because they hold data on my behalf, and neither me nor they want anyone else getting at that data.

But why does InfoQ require me to register to download a free PDF? They say, … we’re happy to offer a free version for download, to get this knowledge in as many peoples hands as possible; however, I had to complete a long form (to which I mostly invented answers), then verify my email address, then navigate back to the page. I hardly see how that is compatible with the stated aim.

After running into that, I almost despaired when I went to download the official MySQL JDBC driver. Almost: there’s actually a link below the register (or log in) options saying “No thanks, just take me to the downloads!”.

Yay for MySQL.com!

by
mikeb
on
12/06/07

Rant: E4X and the DOM

Reading through tonyg’s recent post I came across something i haven’t yet seen in use - inline XML within Javascript code. E4X, it seems, has landed. It is now available by default in Firefox and Rhino - other implementation will surely follow.

E4X, shorthand for ECMAScript for XML is a nice language extension to Javascript adding native XML support. It adds XML types, a notation for literal XML and some basic operations. Previously, if you wanted to use XML in your Javascript code, you had two choices. Since XML has a textual representation, you could work with strings. This approach, however, is extremely error-prone, and is of limited use if you intend to do anything more sophisticted than just generating XML. The other approach is to use the XML DOM, which exposes the full power of XML using a consistent model, but is too verbose and so rather unpleasant to use.

Example: XML using strings / innerHTML

// Short, but notice how I forgot to close the paragraph
// Also, this is non-standard, and only works in HTML
myElement.innerHTML = '<p><b>Hello</b> <i>World</i>';

Example: XML using the DOM

// That must be one of the longest hello world
// examples I've ever written
var paragraph = document.createElement('p');
var bold = document.createElement('b');
var hello = document.createTextNode('Hello');
bold.appendChild(hello);
var italic = document.createElement('i');
var world = document.createTextNode('World');
italic.appendChild(world);
var space = document.createTextNode(' ');
paragraph.appendChild(bold);
paragraph.appendChild(space);
paragraph.appendChild(world);
myElement.appendhChild(paragraph);

As it happens, I am working on something that requires quite a lot of DOM manipulation within the browser, and tired of constructing XML using the DOM API I set to give the new E4X capabilities of Firefox 1.5 a try. The dissapointing reality, I soon found out, is that while E4X is very much present, it can’t be used for accessing or creating DOM elements. So if you plan on parsing some XML data, or generating XML from your program you can use E4X, but DOM manipulation, arguably the most important activity involving XML in a browser is not served by this new extension at all.

Example: How E4X could be used with the DOM

// This is structured XML, notice how there are no quotes
var p_xml =  <p><b>Hello</b> <i>World</i><p>;
// But unfortunately you can't do that
var p_element = document.createElement(p_xml);
myElement.appendChild(p_element)

Javascript is a complete, general-purpose language, but in practice, it is being used exclusively as an extension for host environments. In Firefox, for example, it is used for adding program logic to the browser’s display formats - HTML, XUL and SVG. These formats can be expressed in text, but in order to manipulate them you need to access them using the DOM. For HTML, firefox adopted the nasty innerHTML non-standard extension, which allows the user to access the contents of a node as text. Fortunately, this extension doesn’t work with non-HTML elements. E4X could have been the perfect replacement - a compromise between using the dumb textual representation and the structured, but counter-intuitive DOM.

Why doesn’t Firefox provide a way to construct and manipulate DOM elements using E4X? It’s hard to blame the mozilla developers, given that the ECMA standard does not include any mention of the DOM or how to interact with it. Any extension they would have come up with would end being the next generation innerHTML non-standard.

This failure of the E4X standard, together with tonyg’s previous critique of E4X, as well as other rumours from the Javascript development arena have me wondering whether the standartisation efforts by ECMA have greatly benefited the language and its active community.

by
Tom Berger
on
24/07/06

Rant: Subclassing in JavaScript, part 1

What’s the right way to create a subclass in JavaScript?

Wrong question, say the JavaScript advocates. JavaScript isn’t one of those fuddy-duddy old class-based languages. It’s something much more exciting: a prototype-based language! So remember, when you work with JavaScript, remember never to refer to “classes”, because JavaScript doesn’t have them, and it only shows you’re stuck in the old way of thinking.

I’m sure that these sentiments have done enormous harm to the reputations of real prototype-based languages, so let me banish it right here. JavaScript is not a prototype based language; it most closely resembles a class-based language, but all its mechanisms for doing the work of a class-based language are horribly broken, which is why its advocates try to pretend it’s something else.

The most famous real prototype-based language is Self. In Self, you create new objects by cloning existing objects; this clones both methods and instance variables (”slots”). If you want to be able to add methods to an object after creation, you can create a special object (a “traits object”) and set it as a parent object of some other object; this child object will delegate to a parent object to find a value for a slot that it doesn’t have its own value for.

JavaScript works rather differently. Here’s the detail on how it works:

  • Functions in JavaScript can refer to a special magic variable “this” which is effectively an implicit parameter. If “f” is a function, you can call f.apply(x, [y,z]) and f(y,z) will be called with “this” bound to x.

  • Objects are dictionaries. Those dictionaries can map string keys to any Javascript value, including functions: you can validly do either or both of x.name = "foo"; and x.action = function () {};

  • x.action();” is magic; it’s not the same as “tmp = x.action; tmp()” as it would be in a language like Python. Instead, it means “x.action.apply(x)“.

  • If you look up a key in an object which is not set in the object, and the object has the “__proto__” key set, then it will delegate to the “__proto__” object to try to find the key. (In IE, this key effectively has a non-string name that can’t be programatically accessed, but the effect is the same). That object may delegate recursively to its “__proto__” in turn.

  • Functions are objects have dictionaries associated with them, so you can do var F = function() {}; F.foo = "bar"; print(F.foo); and get back “bar” as you might expect.

  • new F(2, 3)“, where F is a function object, does (roughly) the following:

    • creates a new blank object (call it “res“)
    • res.__proto__ = F.prototype
    • F.apply(res, [2, 3])
    • return res

And that’s it. Well, roughly - see the ECMAScript standard for the gritty details.

From this you can immediately see that it’s not a prototype-based language; objects are not created by cloning a prototype, and indeed JavaScript doesn’t even come with a convenient way of cloning objects out of the box. What JavaScript calls prototypes are more like Self’s “traits objects”, which take the place of classes - they hold what is shared between objects of the same class. Objects are created with a constructor which also specifies the “traits object”, just like in a class-based language.

Not convinced? Wait until you see the discussion of inheritance.

Any object oriented language provides some way or other of saying “I want these objects to be like those objects, except…”. In Python, when writing a class description you directly mark it as a subclass of another class, and those methods that are not overwritten are inherited. In Self, you clone one of those objects to act as your new prototype, and modify it as you see fit, possibly by adding a traits object so that you can add methods applicable to the new object.

So here’s the question I came in with in JavaScript: how do I do it there? If you try to discuss this question with a JavaScript advocate, they’ll try to avoid the question, by tripping you up when you use words like “subclass” and “inheritance” to describe what you’re trying to do.

The truth of the matter is that

  1. this need is felt in any object-oriented language whether prototype-based or otherwise
  2. Self, for one, does provide a good solution to this
  3. JavaScript doesn’t.
In Self, you can create a subclass by cloning an example object of the sort you want to extend, extending it, and using it as a new example object. However, in JavaScript objects aren’t created by cloning; it uses constructors.

The usual way you’ll see inheritance done in JavaScript is as follows:

function Child () {
   ... child constructor goes here
}

Child.prototype = new Parent();

a = new Child();

so “a” is now a Child object, and Child is a subclass of Parent. (Since JavaScript isn’t a prototype-based language I make no apologies for using class-based terminology here.) This approach is badly flawed, as this example demonstrates:

function Parent () {
    this.array = [];
}

a = new Parent();
b = new Parent();
a.array.push(”a”);
b.array.push(”b”);
print(a.array); // prints “a”
print(b.array); // prints “b”

function Child () {
    this.somethingelse = “somethingelse”;
}

Child.prototype = new Parent();

aa = new Child();
bb = new Child();
aa.array.push(”aa”);
bb.array.push(”bb”);
print(aa.array); // prints “aa,bb”
print(bb.array); // prints “aa,bb”

What’s happened here? Every time the “Parent” constructor is called, it creates a new array and puts it in the “array” slot on the new object, so every “Parent” object has its own array. But this constructor is not called for the “Child” objects. Instead, the “Child” objects share a single instance of the “Parent” object for their __proto__, which includes a single instance of the “array” object. aa.array and bb.array refer to the same array, and so changing one changes the other.

Now sometimes you want arrays (and other referenced objects) to be distinct between instances, and sometimes you want it to be shared between instances, but you never want it to be one thing in the superclass and another in the subclass. I have to confess at this point that I don’t know how Self addresses this problem, but I know that whatever solution it has will work consistently for derived behaviour as well as direct because the same mechanism is used for object creation.

I don’t know how to make JavaScript behave like a sensible prototype-based language; maybe there isn’t a way. But there are ways to make it behave like a sensible class-based language. After reading about a dozen different solutions to this online, I came to one which I think goes to the heart of the problem in the simplest way I can see, and which preserves the most flexibility. It’s reasonably efficient, too. This has become long enough, so I’ll describe that in a future blog entry.

Read Part 2

by
Paul Crowley
on
24/07/06
2000-9 LShift Ltd, 1st Floor Office, Hoxton Point, 6 Rufus Street, London, N1 6PE, UK +44 (0)20 7729 7060