Archive for June, 2006

Thermostat code release

I’ve released my simple fan control program described in this entry (see also part one).

THIS CODE MAY MELT YOUR CPU - download only if you plan to read it, test it, and/or hack on it. The license makes it clear that it comes with no warranty.

I’ve already received an interesting email from Mark M Hoffman in reply to my post to the mailing list announcing it, drawing my attention to PID control loops. Looks like it could be a worthy avenue of investigation. I’ve replied with a description of one issue I see with applying a PID controller in this domain.

2 comments June 28th, 2006 Paul Crowley

‘Code as Data’ != Closures

Today I googled for “code as data”. The first hit that came up was this, a tutorial on Groovy that cheerfully proclaims that what is meant by “code as data” is closures! No, no, NO!!!!

“Code as data”, aka “code is data” signifies the ability to manipulate code, i.e. to construct and take apart programs. At the most primitive level that can be accomplished by representing source code as strings, which is possible in most programming languages. Going beyond that, many programming languages define an AST data structure, with a parser and pretty printer, that allows manipulation of code in a more structured manner, enforcing most, if not all, syntactic constraints of the language.

However, neither of these approaches fully capture what is meant by “code is data” in the Lisp tradition. There code really is data. There is no special AST data type and associated parser and pretty printer. Instead all programs are represented in terms of the ordinary data types of the language, such as symbols and lists. The concrete syntax of programs is subsumed by the standard external representation of data.

None of this has anything to do with closures.

2 comments June 28th, 2006 matthias

Making the client do all the work

This paper proposes to reduce the workload of SSL servers by making the clients carry as much of the crypto-related load as possible. I think it’s possible to do even better.

Key agreement: In the above, the server only has to do an RSA public key operation, which is cheap if the exponent is low (eg three). However, we can do even better (and have a stronger security bound too) by using the Rabin operation - modular squaring - instead. This is more than twice as fast as the RSA operation with exponent three. Normally, Rabin encryption is slowed down by certain steps that are needed to handle the fact that modulo a large semi-prime, most numbers that have square roots have four of them, and the recipient has to know which one you mean. However, modern KEM schemes greatly reduce this cost, and Rabin-KEM encryption is just about the fastest public key operation I know of, with the exception of certain probabalistic signature checking schemes.

Signatures: a trick has been missed here. Modern “one-time” signature schemes (eg “Better than BiBa“) can actually sign many messages before the private key must be discarded for security, which in an online/offline signature scheme greatly reduces the number of documents to be signed. For even greater efficiency, a hash tree can be used to sign many one-time keys simultaneously. At the cost of imposing a small latency on all clients, we can even discard the one-time signatures, avoiding a patent, and directly use hash trees; as many clients try to connect, the server can place the documents to be signed in a hash tree and sign them all with one operation. This scheme scales very nicely: the server performs its public key operation at a constant rate of, say, ten per second, and no matter how many clients are trying to connect these signatures will serve to authenticate the server to them all. The clients may have to wait an extra tenth of a second for the connection to complete, but this cost will be small in the cost of connecting to a busy server.

Client puzzles I’m not sure I understand why mixing up the client puzzle step and the public key generation step is beneficial.

With this scheme, the server only has to do one modular squaring per client - and even that only when the client has proven its worth by solving a client puzzle. I wonder if it’s possible to do even better?

2 comments June 26th, 2006 Paul Crowley

Thermostat defeat

When I started on this, I thought I’d be able to dash off a script to keep my CPU fan quiet in a few hours. I’ve just spent far too much of this weekend obsessively hacking on it and testing it, and after creating a tool of great sophistication, I have basically given up in defeat. I’m now using a thermostat-like approach; either the fans are on full or on minimum, nothing in between.

First installment of the saga

Continue Reading 2 comments June 26th, 2006 Paul Crowley

Static analysis of Erlang communication

I had a brief email exchange with the developers of Dialyzer, the static analyzer (some might call it a type checker) for Erlang programs. Currently Dialyzer only performs analysis on the functional fragment of Erlang and I was enquiring whether to extend that to handle communication. That would allow the detection of basic input/output mismatches, e.g. when a message is sent to a process that does not match any of the patterns it is willing to receive.

Going further, one might be able to employ the various techniques developed for process algebras to reason about the concurrent behaviour of Erlang programs and, for example, detect deadlocks and enforce information flow security properties. A good example of such a tool is TyPiCal. It would be amazing to have something like that for Erlang. After all, what makes Erlang interesting is not the functional programming aspect currently checked by Dialyzer, but its support for concurrency, distribution and fault-tolerance. It is incredibly difficult to correctly implement systems that involve the latter. If there is any area of programming in which we want the help of static analysis then this is it!

Anyway, it turns out that there are no immediate plans to extend Dialyzer in that direction. However, I was pointed at some related research that I had hitherto been unaware of: Karol Ostrovský’s PhD thesis, which in Part II describes the

  • sound instantiation of Kobayashi’s generic type system for the pi-calculus to session types,

  • extension of session types to multi-session types (which, afaict, handle sessions that involve asynchronous comms, and servers that handle multiple sessions without spawning),

  • application of multi-session types to type check communication of Erlang processes.

Overall this looks like a promising attempt at constructing a process-algebra-based type system that is decidable and yet expressive enough to reason about non-trivial real-world protocols (IMAP4 is used as an example). The theory behind it seems to be quite involved, but that could just be due to the presentation format - a thesis rather than a paper. It will be interesting to see whether this research is carried any further and eventually materialises in tools for Erlang.

Add comment June 26th, 2006 matthias

E4X: I want my S-expressions back

E4X is a new ECMA standard (ECMA-357) specifying an extension to ECMAScript for streamlining work with XML documents.

It adds objects representing XML to ECMAScript, and extends the syntax to allow literal XML fragments to appear in code. It also supports a very XPath-like notation for use in extracting data from XML objects. So far, so good - all these things are somewhat useful. However, there are serious problems with the design of the extension.

If E4X objects were real objects, if there were a means of splicing a sequence of child nodes into XML literal syntax, and if E4X supported XML namespace prefixes properly, most of my objections would be dealt with. As it stands, the overall verdict is “clunky at best”.

These are my main complaints:

  • It doesn’t do anything like Scheme’s unquote-splicing, and so using E4X to produce XML objects is verbose, error-prone and dangerous in concurrent settings.

    There seems to be no way of splicing in a sequence of items - I’d like to do something like the following:

    function buildItems() {
      return [<item>Hello</item>,
              <item>World!</item>];
    }
    var doc = <mydocument>{buildItems()}</mydocument>;
    

    and have doc contain

    <mydocument>
      <item>Hello</item>
      <item>World!</item>
    </mydocument>
    

    What actually results is the more-or-less useless

    <mydocument>Hello,World!</mydocument>
    

    The closest I can get to the result I’m after is

    function buildItems(n) {
      n.mydocument += <item>Hello</item>;
      n.mydocument += <item>World!</item>;
    }
    var doc = <mydocument></mydocument>;
    buildItems(doc);
    
  • It’s full of redundant redundancy - it’s as verbose as XML, when you can do so much better.

  • There’s no toXML() method (or similar) for use in papering over the yawning chasm between the XML objects and the rest of the language: you can’t even make a Javascript object able to seamlessly render itself to XML.

  • The new types E4X introduces aren’t even proper objects - they’re a whole new class of primitive datum!

  • Because they’re not proper objects, you can’t extend the system. You ought to be able to implement to an interface and benefit from the language’s XPath searching and filtering operations. E4X is so close to offering a comprehension facility for Javascript, but it’s been short-sightedly restricted to a single class of non-extensible primitives.

  • You can’t even construct XML tags programmatically! If the name of the tag doesn’t appear literally in your Javascript code, you’re out of luck (unless you resort to eval…) [[Update: I was wrong about this - you can write <{expr}> and have the result of evaluating expr substituted into the tag.]]

  • E4X XML objects have no notion of namespace prefixes (which are required for quality implementations of XPath and anything to do with XML signatures). Prefixes only turn up in the API as a means of producing (namespaceURI,localname) pairs. This is actually how it should be, but because there’s already broken software out there that depends on prefix support, by not supporting prefixes properly you preclude ECMAScript+E4X from being used for XML signatures or ECMAScript-native XPath implementations.

In my opinion, E4X violates several programming language design principles: most importantly, those of regularity, simplicity and orthogonality, but also preservation of information, automation and structure. SXML, perhaps in combination with eager comprehensions, provides a far superior model for producing and consuming XML. Sadly, there’s no real alternative for ECMAScript yet - we’re limited either to library extensions, or to using the DOM without any syntactic or library support at all.

5 comments June 24th, 2006 tonyg

Bruce J. MacLennan’s Programming Language Design Principles

As far as I can tell, no-one on the web has yet summarised in a single page all of the design principles MacLennan develops in his excellent Principles of Programming Languages (2nd edition, 1986, ISBN 0-03-005163-0): so here they are. I hope they’re as useful to others as they have been to me.

  • Abstraction: Avoid requiring something to be stated more than once; factor out the recurring pattern.
  • Automation: Automate mechanical, tedious, or error-prone activities.
  • Defense in Depth: Have a series of defences so that if an error isn’t caught by one, it will probably be caught by another.
  • Information Hiding: The language should permit modules designed so that (1) the user has all of the information needed to use the module correctly, and nothing more; and (2) the implementor has all of the information needed to implement the module correctly, and nothing more.
  • Labeling: Avoid arbitrary sequences more than a few items long. Do not require the user to know the absolute position of an item in a list. Instead, associate a meaningful label with each item and allow the items to occur in any order.
  • Localised Cost: Users should only pay for what they use; avoid distributed costs.
  • Manifest Interface: All interfaces should be apparent (manifest) in the syntax.
  • Orthogonality: Independent functions should be controlled by independent mechanisms.
  • Portability: Avoid features or facilities that are dependent on a particular machine or a small class of machines.
  • Preservation of Information: The language should allow the representation of information that the user might know and that the compiler might need.
  • Regularity: Regular rules, without exceptions, are easier to learn, use, describe, and implement.
  • Security: No program that violates the definition of the language, or its own intended structure, should escape detection.
  • Simplicity: A language should be as simple as possible. There should be a minimum number of concepts, with simple rules for their combination.
  • Structure: The static structure of the program should correspond in a simple way to the dynamic structure of the corresponding computations.
  • Syntactic Consistency: Similar things should look similar; different things different.
  • Zero-One-Infinity: The only reasonable numbers are zero, one, and infinity.

2 comments June 24th, 2006 tonyg

Overview of Javascript modes for Emacs

Emacsen.org has a nice roundup of the (apparently only) four javascript-mode implementations for Emacs. I went for number three, Karl Landström’s javascript.el, and it’s been working very well.

1 comment June 24th, 2006 tonyg

Adapting C3 Linearization for Java

One of the interesting issues in implementing dynamic dispatch for Java is that the basic C3 linearization algorithm isn’t a very good fit for the complexities of Java’s subtyping. (Note: the following paragraphs rely on the reader having a basic understanding of the details of C3 linearization.)

Java lists a class’s implemented interfaces separately from its superclass. C3 requires, for each class, a list of direct superclasses as input - so to adapt it for use with Java, we have to choose how to combine each class’s superclass with its implemented interface list: for instance, the superclass could be placed at the beginning or at the end of the interface list.

In Java any interface is assignable to Object. So for C3 to make sense, all interfaces which have no super-types have Object as a super-type. If Object is in the super-type list of a type, it must be the last thing - otherwise linearization will always be impossible. So the most obvious thing to do is to include the super-class last in the list of super-types.

This mostly works, but for the way collections from java.util are implemented: The various abstract collections implement their corresponding interface, but the actual implementations don’t directly implement the corresponding interface, so for example AbstractSet implements Set, and HashSet extends AbstractSet, but does not implement Set directly. This is a common pattern.

If, then, you always choose to put the super-class at the end of the list of super-types while performing linearization for C3, this results in an inconsistent linearization for all of Java’s built in collections.

So I ended up doing the following by default: For any super-class other than Object, the super-class goes first in the list of super-types. If the super-class is Object, it gets pushed to the end. This works in an intuitive way in lots of cases, and the implementation supports pluggable ordering, should you need to do something different.

Add comment June 23rd, 2006 david

Multimethods for Java

Dynamic dispatch is a mechanism for selecting a method based on the runtime types of the parameters supplied. Java dispatches instance methods dynamically, using the runtime type of the receiver to choose the code to invoke and ignoring the types of the other parameters (just like Python and many other object-oriented languages). This is called single dispatch. Unfortunately, Java’s dispatching is limited in two important ways: it doesn’t allow class extensions, and it doesn’t support multiple dispatch, as implemented in many other object-oriented languages.

I have written some code which uses reflection and proxy generation to conveniently implement dynamic multiple dispatch for Java. It uses C3 linearization to determine the method to invoke. This algorithm was originally devised for Dylan. You can get the distribution here.

This implementation supports subtyping of both arrays and primitive Java types such as int, byte, char, but does not yet support Java 1.5’s generics. The subtyping relation for arrays and primitive types is based on Java’s notion of assignability - see the documentation for details.

Here’s a trivial example of using the dynamic dispatch library:

  import net.lshift.java.dispatch.DynamicDispatch;

  // define an interface
  public interface NumberPredicate {
      public boolean evaluate(Number n);
  }

  // implement it for some argument types
  public class Exact {
      public boolean evaluate(Float f)  { return false; }
      public boolean evaluate(Double f) { return false; }
      public boolean evaluate(Number n) { return true; }
  }

  // create a dynamic dispatcher
  NumberPredicate exact = (NumberPredicate)
    DynamicDispatch.proxy(NumberPredicate.class, new Exact());

Now, code making use of exact can call the evaluate method with a Number instance, and the DynamicDispatch proxy that is backing the NumberPredicate interface will find the most appropriate method on Exact to invoke based on the runtime type of the argument to evaluate. So, for instance:

  exact.evaluate(new Float(12.3)); // returns false.
  exact.evaluate(new Integer(34)); // returns true.

There are several reasons having dynamic dispatch (and multiple dispatch) is useful when programming for Java:

  • If you want to extend an existing set of classes (for instance, to add an aspect - see below), normally you’d create an interface encapsulating your new feature, and make all the classes in the set implement it. If you are not able to modify the classes, one approach is to create wrapper classes for each, and somehow choose a wrapper class at runtime based on the type of the object you’re working with. You might implement this by myObject.getWrapperClass() - but wait! This makes getWrapperClass a new method that needs to be added: precisely the problem you set out originally to solve.

    Dynamic (single) dispatch helps you out here by conveniently automating the required wrapping and method selection.

  • You might also want to use multiple dispatch, dispatching on the types of multiple arguments. Neither the Java language nor the Java virtual machine supports multiple dispatch. In some cases overloading suffices, but in many it does not.

    Dynamic multiple dispatch reinterprets Java’s notion overloading as actual multimethod dispatch. Syntactically, you’re writing overloaded methods - but using DynamicDispatch, the semantics are those of full multiple-dispatched generic functions.

Dynamic dispatch is useful for adding aspects. For example, you might use dynamic dispatch to write a Java object pretty printer, or custom serializer. I’ve employed it for writing a general equality function which is independent of the object’s own implementation of equals, which I find useful in unit tests. I’ll write about that in a later post.

If this kind of thing interests you, MultiJava, a compiler for a multiply-dispatched variant of Java, might be worth a look.

8 comments June 23rd, 2006 david

Previous Posts

Calendar

June 2006
M T W T F S S
« May   Jul »
 1234
567891011
12131415161718
19202122232425
2627282930  

Posts by Month

Posts by Category