technology from back to front

Archive for March, 2006

A trust metric enabled Wikipedia?

Wikipedia has been described as “the encyclopaedia that works in practice but not in theory”. In theory, an encyclopaedia that anyone can edit should suffer from out-of-control trolling and vandalism, and collapse under its own weight in an instant. In practice, that hasn’t happened; vandals attack Wikipedia daily, but it still does very well as a useful source of information, especially if you bear in mind that its primary competition is not Britannica or other paid-for sources, but the rest of the Internet. Nonetheless, a recent controversy over malicious insertion of false information into Wikipedia had some concluding that the very process by which Wikipedia has “incurable flaws”.

Are these flaws incurable? The problem – that anyone can say anything – is one the Internet itself also suffers from. But in 1996/7, something changed which preserved its usefulness in the face of the avalanche of noise and spam that threatened to bury it: Google arrived. To Internet users increasingly finding that their searches were returning nothing but synthesized advertising pages, Google provided a way to pick the wheat from the chaff. The fundamental innovation that made Google possible we now know as a trust metric.

All trust metrics work on the same principle: if I think well of Ann, and Ann thinks well of Bob, I might be more interested in what Bob has to say than just any random Joe. We operate by this principle all the time when we introduce people to others or make recommendations. A trust metric automates the principle, applying it to a very large number of people and recommendations, to find recommendations we would never have been able to find by the manual method. Google’s trust metric simply treated every hyperlink as a recommendation; if lots of trusted pages link to you, then you must be trusted too.

Better than that, Google’s trust metric is attack resistant – it is designed to give good results in the face of active attackers trying to artificially inflate their rank. This simple requirement is one that many trust metrics fail dismally; for example, until a couple of years ago the open source discussion site kuro5hin trivially allowed users to inflate their own rank by creating new “sock puppet” users who rated them highly.

Can attack resistant trust metrics save Wikipedia? Noted free software author Raph Levien thinks so – he has been researching trust metrics for many years, and expresses his frustration that they are not treated as the primary solution to abusive users despite the amazing success of Google in applying them for this purpose.

I agree with him. If you combine trust metrics and wikis, you can get some interesting possibilities.

Paul Crowley

Base conversions in Scheme

For a particular protocol we’re implementing, we need to fit some possibly large serial numbers (in the millions) into a rather limited number of characters (5); the answer, naturally, is to encode the number.

To give ourselves room, we decided to use base 64. Jakarta Commons has some handy codec classes, which saves us the number-to-character translation; so all we need is to convert our serial numbers into bytes.

For that, Tony came up with this pretty one liner in Scheme:

(define (number->bytes num) (unfold zero? (cut remainder <> 256) (cut quotient <> 256) num)

The body works in general for base conversions 10 -> something; for example,

(unfold zero? (cut remainder <> 16) (cut quotient <> 16) 257) -> (1 0 1)


Java memory profiling with jmap and jhat

My colleagues and I have just spent over a week tracking down a
repeated OutOfMemoryError in a fairly complex web application. In the
process we looked at the jmap and jhat memory profiling tools for the

Starting with Java 1.5, Sun has been shipping a neat tool called
which allows you to attach to any 1.5 JVM and obtain heap layout
information, class histograms and complete heap snapshots. The neat
thing is that you don’t have to configure the JVM with any special
options, and that it therefore runs exactly as during normal
operation. This is important since it makes analysis much more

Obtaining information on the heap layout is a near-instantaneous
operation, so it doesn’t slow down execution. By contrast, taking
class histograms and heap snapshots can take considerable time, during
which the execution of application code is stalled. For example, a
snapshot of a ~700MB heap took nearly half an hour to complete on a
3GHz dual-Xeon Redhat box.

Btw, it took a while to figure out the correct options for taking a
heap snapshot:

jmap -heap:format=b

This will produce a file called “heap.bin” in the current directory.
The option is not mentioned in the jmap docs, though running

jmap -help

does list it.

Taking a heap snapshot is all very well, but what do you do with it?
That’s when we ran into problems. There seem to be hardly any tools
out there than can read the binary heap dump format. Sun ships one as
part of JDK 1.6 (Mustang), called
[jhat](, of
which an [older version]( is available for
earlier JVMs. The tool is run like this:

jhat -J-mx768m -stack false heap.bin

which sets the memory available to jhat to a value just above the size
of the heap to be analysed and suppresses tracking of object
allocations (leaving that on just seems to result in thousands of

On the surface, jhat looks quite promising – it allows
you to traverse the heap, display histograms etc – all via a web
browser interface. However, on many occasions jhat failed to be able
to successfully read our heap dumps, failing with the error Unrecognized heap dump sub-record type: 254

The value in the first line varies depending on the snapshot, but the
rest of the error is always the same. Having spent hours waiting for a
process to reach a critical state, and another half an hour taking the
heap snapshot, it is more than a little frustrating if all you have to
show for at the end is a few hundred megs of useless heap dump data.

Another problem with jhat is that the analysis capabilities are very
basic. When one is analysing heaps with millions of objects, one
desperately needs tools that can group information in sensible
ways. For example, it would be incredibly useful to select a class and
get a percentage breakdown of what classes refer to how many instances
of the chosen class, and then extend this view all the way back to the
root set. jhat just doesn’t have anything like that.

Still, even the basic analysis did point us into some useful
directions of enquiry in tracking down our memory leak. In fact, even
just looking at the class histograms produced by jmap – no need to
take a full heap snapshot and fire up jhat – produced good clues. The
final insight came from another one of jmap’s outputs – the heap
layout information. This pointed out that we were running out of space
in the “permanent generation” – even though we had set its limit to an
unusually high value. There isn’t [much
around on the role of the permanent generation. Our theory is that it
is a space in which objects are allocated that the JVM believes are
very unlikely to ever be garbage collected. The permanent generation
still takes part in a full GC, but unlike the old generation there is
probably no copying and compacting going on. Anyway, the problem
turned out to be that we were calling into
[Jython]( in a way that resulted in it
associating a few megabytes of Java reflection information with
individual threads. Tomcat is using several hundred threads and, more
importantly, is pooling them. Do the maths.


London Metropolitan University Guest Lecture

Chris Stephenson from LShift is presenting a guest lecture titled ‘From Marketing Management to Management of Marketing’ at London Metropolitan University on April 27 2006.

Traditional marketing and management academics tend to concern themselves with what managers in organisations should do and the decisions that they should take. This usually involves functions such as planning and performance management, or the use of tools and techniques such as product lifecycle analysis, market segmentation and the 4 Ps. An alternative approach is look at what marketing managers actually do in practice.

As a result, my experience of Product Management will be described using a series of metaphors that give perspectives of what it is to be a Product Manager. This follows a new body of thought into the descriptive, rather than prescriptive, ways that we can examine Management, originating from Professor Geoff Easton at Lancaster University.

Through examining this description of Product Management, the session will conclude with a discussion as to whether marketing management is predominantly about marketing or whether it is an issue of general management.

The lecture is open to all London Metropolitan students and will be held at London Metropolitan University in room G01, Stapleton house at 5.30pm, April 27 2006.




You are currently browsing the LShift Ltd. blog archives for March, 2006.



2000-14 LShift Ltd, 1st Floor, Hoxton Point, 6 Rufus Street, London, N1 6PE, UK+44 (0)20 7729 7060   Contact us