Archive for October, 2005

Looking for needles in haystacks

Here’s a proper programming challenge, simple to state, and in the end quite challenging to solve. A function produces a 64-bit output given a 64-bit input. It mostly behaves like a random function, so given a random input, most outputs occur with probability between 0 and 2-63; in other words, for a randomly-chosen 64-bit y, there are usually between 0 and 2 solutions to the equation f(x) = y. However, a few occur with much greater frequency - more like 2-32 or so. The function is pretty quick and easy to calculate, but its internal structure defies direct analysis. How would you identify these more frequent occurrences by experiment? How big a computer would you need to do it in something like a few hours to a few days? Think about it for a while before you read on.

Continue Reading Add comment October 26th, 2005 Paul Crowley

Slate

I’ve just taken another look at where the Slate language project has gotten up to. Last time I looked was almost a year ago now. There’s been a lot of progress! The time seems ripe for Slate to get an event loop, so that proper networking and threading can be built on it.

Add comment October 25th, 2005 tonyg

Big bucks for big bugs

I was recently working on a project where the client wanted us to add some features and updates to a web-facing database which they had paid some big name consultants to develop for them.

Wow, did what we received look impressive. It used some of the latest industry buzz-words: SQL Server, ASP.NET, Web services, to name a few; It came with pages and pages of documentation, with “wireframe models”, field-by-field breakdowns for each page, and even a schema diagram for the database. The documentation itself must have cost quite a bit (at a guess, it must have taken a Word-monkey at least 2 weeks to prepare).

However, when we started digging in, things didn’t look quite as impressive. Firstly, the documentation, detailed and copious as it was, was useless because it didn’t correspond to the actual code. Even the database schema didn’t match - it looked as if the documentation and the implementation had evolved independently.

As we started to take it apart and work out how it worked, more problems started to surface. Here’s a list of some of the more serious bugs we found.

  1. The schema for one of the tables was too wide; there were several very large VARCHAR columns (thousands of characters per column) in the definition of this table. The schema defined records which could be as big as 24000 bytes, whereas SQL Server has a limit of 8060 bytes. This means that inserts (and updates) of large records could fail or be truncated. The client never came up against this problem presumably because they hadn’t created a record big enough. How could the developers have just ignored the warning messages that SQL Server must have given?
  2. The website had a search page which sloppily used string concatenation to build the corresponding SQL query. Because no efforts were made to clean-up the user-provided input, this part of the website was vulnerable to SQL-injection attacks! In principle, a remote website visitor could type carefully crafted values into a search field and get the website to execute arbitrary SQL queries!

There were other weird parts in the code. For example, at one stage, the code concatenates several fields into a string using the pipe character (”|”) as a separator, and then later splits it again. Unfortunately, it didn’t make any effort to either check for/escape legitimate pipe characters in the field values, nor document that the pipe character shouldn’t be used in these fields. Why do this? VB.NET has perfectly usable record types (look for the Structure keyword - even QuickBASIC had the TYPE keyword since the late 80’s).

Working on this project has just given me more evidence to believe that the big name consultancies don’t give value for money.

Add comment October 19th, 2005 francis

Truncated differential cryptanalysis of five rounds of Salsa20

eSTREAM have just put my paper online: Truncated differential cryptanalysis of five rounds of Salsa20 (PDF) (discussion, Wikipedia on Salsa20). This doesn’t break the whole cipher, just a seriously reduced version.

Experimentation played a key role in finding this result. I found the first differential by writing a short Python program that implemented a pretty naive differential-characteristic finding strategy. But when I went to test experimentally that the characteristic worked, I found it was there with eight times the frequency I expected. Further experimentation showed that this was due to clustering differential trails resulting in the same characteristic. From there, it was straightforward to just experimentally count the occurrence of lots of characteristics, and then it makes sense to use lots of them to make the attack far more effective. As a result, many fewer differential pairs are required.

The discovery of trails whose probability is not correctly predicted by theory was also a great and exciting surprise. I’m now thinking about how to investigate how to account for these discrepancies; once we can account for them, maybe we can make use of them to build more powerful attacks.

Add comment October 17th, 2005 Paul Crowley

Spoon and the Object Visualiser

Take a look at this part of NetJam.ORG, in which Craig Latta builds a visual display of a running Smalltalk Image. Be sure to check out the movie of the first 50ms or so of a running image. Smalltalk gets all the cool toys.

Add comment October 17th, 2005 tonyg

Something about early, and often?

Naturally, almost as soon as I’d posted about my extensions to pregexp, I discovered a bug in said extensions.

It was something I should have caught, too: in the code I had a constant (3) which I couldn’t explain. I had expected it to be 2, not 3, and the need for the extra increment wasn’t clear to me. Now that I’ve caught the bug, I know why the increment seemed to work: it was compensating for a missing end-of-stream detection in another place.

The revised patch to pregexp version 20050502 can be downloaded here.

Add comment October 12th, 2005 tonyg

When will the Haskell community finally get their act together

…and produce a usable package system?

The absence of a standardised and user-friendly mechanism to package & distribute libraries, and to locate and install such libraries has been a major stumbling block in the adoption of non-mainstream programming languages for a long time. I wrote about it here in the context of Scheme.

Haskell is actually in a much better position than Scheme for addressing this, since there are far fewer implementations (and a single one - GHC - with an overwhelming market share), and far fewer incompatibilities between implementations. I have high hopes for Cabal. However, so far it seems to have failed to deliver on its promises, as evidenced by the following snippet from the installation instructions for WashNGo:

In some setting, it will be complicated to get everything to work with the database connection. The reason is that some versions of c2hs (0.13.6 in particular) do not install properly as a package: the hs-libraries component of the package description is missing with the consequence that programs do not link properly. If you can fix they you fly.

Installing newer versions (0.14.2 or 0.14.3) is not possible out of the box because they require cabal with version >= 1.0.1. Alas, GHC 6.4.1 ships with 1.0. Installing cabal-1.1.4 does not help either because…

 .../c2hs-0.14.3 > ./Setup.hs build
 Preprocessing executables for c2hs-0.14.3...
 Building c2hs-0.14.3...
 Chasing modules from: c2hs/toplevel/Main.hs
 Could not find module `CForeign':

Oh well.

That’s exactly the kind of thing that will turn off all but the most determined geeks from exploring Haskell.

2 comments October 11th, 2005 matthias

xxexpr.ss - an SXML-to-XML converter

The SSAX XML parsing- and processing-library provides robust, high-quality XML reading tools for Scheme, but doesn’t include a general purpose XML writer. Over the past couple of years, a few of my projects have had a need to convert SXML-like data to an XML 1.0 external representation, and so I’ve written a portable SXML-to-XML printing library (both a snapshot and a darcs repository). The library has been used with Chicken, MzScheme, and SISC, and currently includes module wrappers for Mzscheme and SISC (or other psyntax-based Schemes).

The library is parameterized over a choice of double- or single-quotes for attribute printing, and can, if required, be instructed to use explicit close-tags when an empty-tag is encountered. It provides procedures for producing a string representation of an XML fragment, for printing an XML fragment directly to a port, and for pretty-printing an XML fragment with indentation. For example,

(pretty-print-xxexpr
 (let ((title "My Page"))
   (list
    '(*pi* xml (version "1.0"))
    `(html (head (title ,title))
           (body (h1 ,title)
                 (p "Hello, world!"))))))

produces the following output (don’t mind the invalid XML PI, that’s something WordPress is doing - the actual output from the library is well-formed!):

< ?xml version='1.0'?>
<html>
    <head>
        <title>My Page</title>
    </head>
    <body>
        <h1>My Page</h1>
        <p>Hello, world!</p>
    </body>
</html>

You can either browse the code, or retrieve it using

$ darcs get http://www.lshift.net/~tonyg/xxexpr/

Add comment October 11th, 2005 tonyg

A SRFI-10-style extension to JSON

Background

The data language JSON is a great replacement for XML for many applications. It’s very similar in spirit to Lisp and Scheme S-expressions, as well as to XML: it is a pure data language, with no intrinsic semantics.

XML doesn’t allow direct literal representation of any data types other than strings and XML nodes (to oversimplify slightly); S-expressions don’t allow direct literal representation of anything but atoms, pairs and vectors; and JSON doesn’t allow direct literal representation of anything but atoms, lists, and dictionaries.

There are currently no extensions to XML to remedy this, to allow programmers to extend the XML reader to support literal notation for other classes of object. For S-expressions, however, there is SRFI 10: Sharp-Comma External Form, which allows the S-expression reader to be extended to allow read-time construction of literal objects.

Reader/Writer Extensibility for JSON

I’ve implemented a SRFI-10-like construction for JSON, allowing extension of the JSON reader and writer, which makes it easier to use JSON for things like serialization or high-level messaging protocols.

You can download the modified json.js from here.

The basic extension is the addition of @constructor ... notation to the reader: whenever the JSON parser sees an “@” sign, it parses a single identifier, and then a complete JSON object called the argument. The identifier is looked up in a table of named constructor functions, and if a match is found, the corresponding constructor function is called with the argument. If no matching constructor function is found, a parse error is signalled.

The corresponding extension to the writer is the toJsonString method. If an object that the JSON.stringify function encounters possesses a toJsonString implementation, the method is called and its result is returned as the stringification of the object.

Example

As an example, imagine an application where a client and server shared some notion of a database of objects, indexable by some unique identifier. The client side uses proxies to manipulate these objects, and the proxies send JSON RPC requests to the server (perhaps using XmlHttpRequest) to query and update the database held by the server. There are two ways we could represent one of these database objects: as just its unique identifier, which requires a manual lookup on both the server and client side whenever a reference to the object is to be sent over the wire, or using the @constructor extension.

MyDatabaseObject.prototype.toJsonString = function () {
  return "@dbobject " + this.id;
}

An example message from the client to the server might be

{
  "action": "retrieveNextObject",
  "arg": @dbobject 42
}

When the server parses the message out of the HTTP request (or other transport), it calls JSON.parse with an extra argument:

var requestFromClient = JSON.parse(requestText, {
  "dbobject": function (arg) {
    return DB.lookupById(arg);
  }
});

When the parser sees the “@dbobject ...” part of the input string, it will call the constructor given to JSON.parse, which looks up and returns the object in the database by its identifier, making for a convenient, hassle-free deserialization of a complex object.

2 comments October 11th, 2005 tonyg

Continuation-Passing Style in Javascript

As part of my recent experiments with Javascript, I happened to want a sleep function. Interestingly, there isn’t one available in browser-side Javascript environments - instead, there are four functions

setInterval
clearInterval
setTimeout
clearTimeout

and they take functions to call after a certain number of milliseconds have gone by. This leads to programming in continuation-passing style (see here, here, here, here, and here):

// ... some code ...
// At this point, we want to pause for one second before continuing.
setTimeout(function () {
    // ... code here continues running after 1000 milliseconds have gone by.
}, 1000);

It seems that a (laudable!) design decision has been made for avoiding blocking operations in browser scripting languages. It’s not just sleep-a-likes, either: the decision affects the design of XmlHttpRequest, which is used in AJAX-like server callbacks. The way you receive the server’s response to an XmlHttpRequest message is by supplying a callback function. This again leads to a continuation-passing style of programming - see the definition of AJAJ.proxy in this file. Here’s an example of AJAJ.proxy in action, as part of a simple AJAX+JSON chat application:

function rpc(action, arg, k, kfail) {
    var urlPrefix = (new RegExp("(.*/)([^/]+)")).exec(document.location)[1];
    AJAJ.proxy(urlPrefix + "chat-server/simple-chat.asp")
        ({action: action, arg: arg}, k, kfail);
}

function consoleRpc(action, arg) {
    rpc(action, arg,
        function (r) { console.recordOutput(formatConsoleRpc(action, arg),
                                            JSON.stringify(r)); },
        function (e) { console.recordError(formatConsoleRpc(action, arg),
                                           JSON.stringify(e)); });
}

The two functions k and kfail passed to the rpc function serve as the success and failure continuations, respectively. If the XmlHttpRequest succeeds, then the function starting with “function (r) { ...” will be called with the response from the server; otherwise, the function starting with “function (e) { ...” will be called with an exception object that reports on the failure. We’re using closures to extend the available control structures of the language.

Add comment October 11th, 2005 tonyg

Previous Posts

Calendar

October 2005
M T W T F S S
« Sep   Nov »
 12
3456789
10111213141516
17181920212223
24252627282930
31  

Posts by Month

Posts by Category