Last month we saw one way how to produce decent error messages while parsing. I’ve also mentioned that parsing with derivatives is a *non-blocking* parsing technique. What’s that actually mean?

Let’s look at a counterexample first. PetitParser assumes that it will parse a complete whatever-it-is-you’re-parsing: a complete Smalltalk method, a complete SIP packet, a complete HTML document. If you attach a PetitParser to a socket, and you give it half a SIP packet, it will wait for more input, blocking the execution of that thread. (Really, in most Smalltalks that means blocking a `Process`

, which represents a green (preemptively multitasked) thread.

If you’re writing a network server and you use a blocking parser, that means you’ve just added a denial of service vector: an attacker can open connections which will use up more and more threads, and give you a headache.

In contrast, a non-blocking parser will read what it can from its input, and return. In the case of parsing with derivatives, it will return a parser that contains parses for the input received so far. So let’s try it out!

| d num s |

d := DerivingParser deriverBlock.

num := (LiteralSet tokens: ($0 to: $9)) star reduce: [:digits |

digits inject: 0 into: [:acc :each |

(acc * 10) + (each asInteger - $0 asInteger)]].

s := '12a3' reading positionTracking parsing: num.

{s get parseNull.

s get parseNull.

[s get] on: ParseError do: [:e | e messageText]}

"=> {a Set(1).

a Set(12).

'Error at line 1, column 3: Expected one of <$0, $1, $2, $3, $4, $5, $6, $7, $8, $9>, found a <$a>.'}"

d := DerivingParser deriverBlock.

num := (LiteralSet tokens: ($0 to: $9)) star reduce: [:digits |

digits inject: 0 into: [:acc :each |

(acc * 10) + (each asInteger - $0 asInteger)]].

s := '12a3' reading positionTracking parsing: num.

{s get parseNull.

s get parseNull.

[s get] on: ParseError do: [:e | e messageText]}

"=> {a Set(1).

a Set(12).

'Error at line 1, column 3: Expected one of <$0, $1, $2, $3, $4, $5, $6, $7, $8, $9>, found a <$a>.'}"

Note that each `get`

pulls precisely one character off the underlying stream. In the context of a network server, we can simply read whatever’s in our socket buffer, and the parser will process what it can and return. No denial of service, and it means you can employ standard asynchronous IO in a thread that services many sockets as and when they receive data.

Under the hood, the implementation’s quite simple: `#parsing:`

returns an `XTTransformReadStream`

. This stream lets us read an arbitrary number of elements from a source stream, process them in any fashion we like, and then write an arbitrary number of elements to a destination stream (from which downstream streams will consume). Here, we read one element from `input`

, calculate the derivative of the current parser with that object, record that parser, and push it to `output`

. We do have to fiddle a bit with the output – `contentsSpecies: Array`

so that XTreams knows how to store the interim parsers.

XTPositionTrackingStream >> parsing: aParser

"Return an XTream that takes a parser P, consumes a character C, calculates

the derivative of P with respect to C, and produces a new parser. Throws a

ParseError if it unexpectedly derives Empty."

| d derivative parse |

derivative := aParser.

d := DerivingParser deriverBlock.

^ (self transforming: [:input :output |

parse := [:next :p | | drv |

drv := ParserCompacter value: (d value: next value: p).

drv isEmptyParser

ifTrue: [ParseError

signalLine: input lineNumber

column: input columnNumber

error: drv error]

ifFalse: [drv]].

derivative := parse value: input get value: derivative.

output write: derivative])

contentsSpecies: Array;

yourself.

"Return an XTream that takes a parser P, consumes a character C, calculates

the derivative of P with respect to C, and produces a new parser. Throws a

ParseError if it unexpectedly derives Empty."

| d derivative parse |

derivative := aParser.

d := DerivingParser deriverBlock.

^ (self transforming: [:input :output |

parse := [:next :p | | drv |

drv := ParserCompacter value: (d value: next value: p).

drv isEmptyParser

ifTrue: [ParseError

signalLine: input lineNumber

column: input columnNumber

error: drv error]

ifFalse: [drv]].

derivative := parse value: input get value: derivative.

output write: derivative])

contentsSpecies: Array;

yourself.

One last note for those not familiar with Smalltalk syntax, the bit with the semicolon is a *cascade*. Messages separated by semicolons indicate a sequence of messages sent to the receiver of the first message. So

| arr |

arr := #(1 2 3).

arr

at: 1 put: 0;

at: 2 put: 1;

at: 3 put: 2;

yourself.

"is equivalent to"

arr at: 1 put: 0.

arr at: 2 put: 1.

arr at: 3 put: 2.

arr yourself.

arr yourself == arr "=> true"

arr := #(1 2 3).

arr

at: 1 put: 0;

at: 2 put: 1;

at: 3 put: 2;

yourself.

"is equivalent to"

arr at: 1 put: 0.

arr at: 2 put: 1.

arr at: 3 put: 2.

arr yourself.

arr yourself == arr "=> true"

The last message – `#yourself`

– simply returns the original receiver. In `XTPositionTrackingStream >> parsing:`

that means we return the `XTTransformReadStream`

.

2000-14 LShift Ltd,
1st Floor, Hoxton Point, 6 Rufus Street, London,
N1 6PE,
UK+44 (0)20 7729 7060
Contact us