Archive for August, 2006
Tim Malbon says he’s pleased to be involved in this Web thing, but appears to conflate lack of full assimilation with brain-washing.
One can be a tiny bundle of techno-lust and not believe that successive versions of the Web will lead directly to a shiny new future.
Posting photos to Flickr (Web2.0) seems like something of a tenuous, circuitious route to finding housing for homeless children (shiny future). On the other hand, if you followed that link you’re contributing to something like a counter-argument.
It’s not that I begrudge boosterism, because some interesting things come about that way. I just find the juxtaposition of They see [the Web] as ‘work’ rather than the defining social revolution du jour
with They have the air of people who belong to a mind control cult
as, well, exquisite.
Having said all that: I quite agree with Tim’s sentiments that Small is Good, creating neat technology isn’t work, and I’d rather be passionate about what I do than uninterested (happily, I am the former).
August 25th, 2006
mikeb
Toward the end of last week, I found myself wondering about implementing a diff algorithm in Javascript. It turns out there’s already at least one available attempt at this, by John Resig (direct link: jsdiff.js). Unfortunately, the implementation is a little buggy (try diffString("the quick", "the quick fox"), for example), so I went hunting for source material to implement my own.
I ended up finding J. W. Hunt and M. D. McIlroy, “An algorithm for differential file comparison”, Bell Telephone Laboratories CSTR #41 (1976) (postscript), which turned out to be quite implementable.
Here’s my implementation: hunt-mcilroy.js. With luck, I’ve even implemented something close to the described algorithm.
As an example, the output from
diff("the quick brown fox jumped over".split(/\s+/),
"the quick fox jumps over".split(/\s+/))
is
[{common:["the","quick"]},
{file1:["brown"],
file2:[]},
{common:["fox"]},
{file1:["jumped"],
file2:["jumps"]},
{common:["over"]}]
The next step is to implement a diff3 equivalent, and then I’ve the tools to implement rudimentary version-control. It’d sure be a fine thing to see TiddlyWiki evolve darcs-like distributed change-propagation…
August 15th, 2006
tonyg
In my last entry, I described a problem, and promised a solution. There are many solutions online and in various JavaScript toolkits; as far as I can tell this one is the most elegant of the bunch, and I’d be pleased to see it replace the various other solutions out there.
The main goals of this approach are:
The problem described in the previous post are avoided. What is per-instance stays per-instance, and what is shared stays shared; subclassing doesn’t change one into the other.
The code to create a subclass looks reasonably elegant and straightforward and not too wordy.
The result is reasonably efficient in space and time
The approach here also meets one nice but less important goal:
- The approach can be used to subclass from a class not specifically built to work with this approach - in other words, when not subclassing, you make classes in the normal way.
I shan’t include a comparison with any other specific solutions here, but if there are any you’re interested in, ask me.
Continue Reading August 3rd, 2006
Paul Crowley
One of the new projects that I’m working on involves a messaging infrastructure in Erlang. Without boring you with the details, the basic idea is that there are two types of messages, A and B and these are both sent to a thread (or a process in Erlang). One A must be paired with one B before the A and B can be discarded. Performance is an issue so this pairing must be fast. Several approaches were developed in trying to make this go very fast:
- The first approach is also the dumbest. All A and B messages are sent to the same process. The process deals with these messages in a fifo order and thus must internally maintain queues of As and Bs, matching them as becomes possible. The problem with this is that flooding the process with As or Bs cripples performance because of processing the queue in fifo order. So a DoS attack is very possible and would often accidentally occur (most traffic is bursty).
- The next idea is to not process the messages in a fifo order. This is possible because of Erlang’s receive statement which can do pattern matching on the messages in the queue. The problem is that the process must scan every message in the queue in a fifo order each time through the receive block until it finds one that matches. So whilst the DoS attack can’t happen in the same way, it can still cripple performance as the message queue must be scanned until you reach the end and find the first message of the other type so that the pairing can occur. So this cripples performance again.
- Use three processes. Send messages of one type to one process and of the other type to another process. These processes are then buffers. They process messages in a fifo order (which is fast) and put all the messages they receive into an internal queue. The third process (the pairer) then sends fetch messages to these buffers which then send a predetermined number of messages in their internal queue to the pairer. The pairer then scans its messages in a non-fifo way but because the message queue is effectively bounded, there is no significant performance drop. The points are these:
- There is potentially a DoS attack possible which prevents the buffers from receiving the fetch message. However, firstly by processing the message queue in a fifo order it will be received, just possibly not immediately; and secondly, the buffers simply remove A or B messages from their message queues and place them in an internal queue, thus there is very little processing done. So getting through several thousand A or B messages does not take and real amount of time.
- The pairer only asks for more messages (via fetch messages sent to the buffers) iff it has exhausted its quota of messages: i.e. it asks for n messages from each buffer and will only ask for more if it has received n messages from each buffer. This means that its queue will never be more than 2 n messages long.
- Careful testing reveals that n = 7 is the best performance on the hardware I have available. This balances the cost of the pairer processing messages out of order with the extra round trip to the buffers. Lower n means that there are too many round trips sending fetch messages to the buffers but the non-fifo processing is really cheap. Higher values of n mean the pairer’s message queue gets too big so the non-fifo processing costs too much.
- By using three processes, it can make good use of thread level parallelism hardware. The two buffers can get through their message queues in parallel as can the pairer. There’s also therefore the possibility of distribution across multiple machines.
- It’s very possible to alter the pairer so that it sends the fetch messages after it has achived m pairings where m < n. This means the pairer’s message queue would be a maximum of 2 n + 2(n - m) messages long but would reduce the delay between the pairer sending the fetch messages and the buffers being able to receive the fetch message and send messages on to the pairer - effectively the fetch message ends up higher up in the buffers’ message queues.
It’s quite ironic that a language that seems, at least on the face of it, ideal for messaging-type applications turns out to have implicit semantics that, if not hindering, at least make you think much harder when trying to implement certain requirements…
August 1st, 2006
matthew