technology from back to front

Compacting cyclic parsers

When I wrote my Smalltalk deriving-with-parsers library, I ran into an issue with compaction: cycles in the parser. Self-referencing parsers (corresponding to left- and right-recursive rules) occur naturally, so I couldn’t hide from the problem. I investigated two ways to introduce circularity as well as how to compact these graphs: delegates, and “sutures”.

Let’s first look at delegates:

DerivingParser subclass: #DelegateParser
  instanceVariableNames: 'parser'
  classVariableNames: ''
  poolDictionaries: ''
  category: 'Parsing-Derivatives'.

DelegateParser >> parser
  parser ifNil: [UnresolvedDelegate accessing: self].
  ^ parser.

DelegateParser >> parser: aParser
  parser := aParser.

DelegateParser >> isResolved
  ^ parser notNil.

DelegateParser >> isDelegate
  ^ true.

DelegateParser >> subParsers
  ^ parser
    ifNil: [#()]
    ifNotNil: [{parser}].

Compaction maps parsers to other parsers that are hopefully smaller. Most parsers are already compact – it’s only the Union and Cat parsers that might be compacted. The literature contains the rules for compaction. I won’t reproduce them here save one, to serve as an example. Imagine we have constructed this parser partway through a parse, a Red(ucing parser) wrapping a Red wrapping an EpsStar: ((EpsStar treeSet: {1. 2}) reduce: [:x | x * 2]) reduce: [:y | y + 1]. Obviously a (slightly) simpler parser would compose the two reducing functions. Thus one of the compaction rules for a Red is “the compaction of a Red wrapping a Red wrapping a parser p is a Red wrapping p whose reducer is the composition of the reducers of the two Reds”. In this case that’s (EpsStar treeSet: {1. 2}) reduce: [:x | (x * 2) + 1].

Imagine a simple cyclic grammar:

| ones d |
d := DelegateParser new.
ones := DerivingParser emptyString or: ($1 asParser then: d).
d parser: ones.

We can see that this particular parser is already compact: the leaf parsers are by definition compact, and the Union and Cat parsers are combinations of compact parsers. But consider writing a compacter. We traverse the graph, and find a delegate. The delegate points to a parser “up” the graph, so compaction maps this delegate to another delegate pointing at… oh, wait. We don’t know yet, because we need to complete compaction to know this. Hm. OK. Instead, we’ll come back to it later. Delegates that don’t form a cycle are easy to compact: they’re just delegates pointing to the compaction of the pointed-to parser. So given the basic algorithm

Object subclass: #SutureCompacter
  instanceVariableNames: 'cache'
  classVariableNames: ''
  poolDictionaries: ''
  category: 'Parsing-Derivatives-Support'.

SutureCompacter class >> value: aParser
  ^ self new value: aParser.

SutureCompacter >> initialize
  super initialize.
  cache := Dictionary new.

SutureCompacter >> value: aParser
  | visited |
  cache at: aParser ifPresent: [:compacted | ^ compacted].
  visited := IdentitySet new.
  self compact: aParser.
  self resolveDelegates: (cache at: aParser).
  ^ cache at: aParser.

let’s look at #compactDelegate: (called by #compact:) and #resolveDelegates: a bit closer.

SutureCompacter >> compactDelegate: aDelegateParser
  "The compaction of a delegate parser is the compaction of its subparser.
   A delegate parser might not be resolvable yet, because if the delegate
   closes a cycle, the parser to which it should point will not have been
   mapped yet."

  | mappedSubparser |
  "This delegate completes a cycle. Its subparser hasn't been mapped yet.
   Mark its presence, and bail. The post-walk will resolve things."

  (cache contains: [:s | s reference = aDelegateParser parser])
    ifFalse: [^ DelegateParser new].
  mappedSubparser := self cached: aDelegateParser parser.
  (aDelegateParser parser == mappedSubparser reference)
    ifTrue: [^ aDelegateParser].
  (cache includesKey: aDelegateParser) ifTrue: [
    ^ DelegateParser new parser: mappedSubparser].
 
  ^ DelegateParser new.

SutureCompacter >> resolveDelegates: aParser
  PreOrderParserWalker
    walk: aParser
    doing: [:p |
      (p isDelegate and: [p isResolved not])
        ifTrue: [ | preimage |
          preimage := cache keyAtValue: (cache values detect: [:sutr |
                        sutr reference = p]).
          p parser: (self cached: preimage parser)]].

We’ll see that sutr in a bit. So #resolveDelegates: just walks the compacted parser, filling in the blanks left behind by the cycle-causing delegates. We have one final trick up our sleeves: while we’re looking up parsers in our cache, and we’re looking up a delegate, it does no harm to return the pointed-to parser (and might make the graph one node smaller):

SutureCompacter >> cached: aParser
  "If a delegate doesn't form part of a cycle, it does no harm to point to
   the delegate's reference instead of the delegate. This also makes the
   parser one node smaller, potentially."

  | p suture |
  suture := cache at: aParser.
  p := suture reference.
  (p notNil and: [p isDelegate and: [(self formsACycle: p) not]]) ifTrue: [
    ^ cache at: p parser].
  ^ suture.

SutureCompacter >> formsACycle: aDelegateParser
  "If aDelegateParser is not yet resolved, treat it as though it forms a cycle."
  [aDelegateParser parser] on: UnresolvedDelegate do: [^ true].
  ^ aDelegateParser isChildOf: aDelegateParser parser.

So with this we can now compact parsers with or without cycles. But what about that sutr bit? What’s up with that? sutr is short for Suture. Hopefully the name suggests “ties things together” and “it disappears by itself after surgery”:

DerivingParser subclass: #Suture
  instanceVariableNames: 'reference'
  classVariableNames: ''
  poolDictionaries: ''
  category: 'Parsing-Derivatives'.

SutureCompacter class >> pointingAt: anObject
  ^ self new pointingAt: anObject.

SutureCompacter >> pointingAt: anObject
  self reference: anObject.

SutureCompacter >> reference
  ^ reference.

SutureCompacter >> reference: anObject
  reference := anObject.

SutureCompacter >> subParsers
  ^ reference
    ifNil: [#()]
    ifNotNil: [{reference}].

So far there’s nothing really to differentiate between a Suture and a DelegateParser. That’s fine. Sutures are just markers. We’ll change the basic algorithm ever so slightly:

SutureCompacter >> value: aParser
  | visited |
  cache at: aParser ifPresent: [:compacted | ^ compacted].
  visited := IdentitySet new.
  self compact: aParser.
  self resolveDelegates: (cache at: aParser).
  self tieSutures: aParser. "<-- This line's new."
  ^ cache at: aParser.

Before we go any further, I should mention the magic trick we’re about to use. Smalltalk has a feature called #become:: a become: b means that, across the entire image, everything that points to b now points to a and vice versa. This is implemented using #becomeForward:, a “this thing becomes that thing (in one atomic operation)”. So how do we use this? As we walk the parser in a pre-order traversal, we populate the parser -> compacter parser mapping (cunningly misnamed as cache in the code):

SutureCompacter >> compact: aParser
  "Pre-order mark the parser as visited; post-order compact the parser (so
   that all its subparsers (except in a cycle) have been compacted)."

  cache at: aParser ifPresent: [:p | ^ p].
  cache at: aParser put: Suture new.
  aParser subParsers do: [:p | self compact: p].
  (cache at: aParser) reference: (aParser class
    caseOf: {
      "Optimise case testing by putting the most common cases first"
      [Empty] -> [aParser].
      [EmptyString] -> [aParser].
      [EpsStar] -> [aParser].
      [Literal] -> [aParser].
      [Cat] -> [self compactCat: aParser].
      [Union] -> [self compactUnion: aParser].
      [Red] -> [self compactRed: aParser].
      [DelayedParser] -> [self compactDelayed: aParser].
      [DelegateParser] -> [self compactDelegate: aParser]}
    otherwise: [aParser]).

You’ve seen me switch on classes before: functional decomposition is sometimes rather useful.

SutureCompacter >> cached: aParser
  "If a delegate doesn't form part of a cycle, it does no harm to point to
   the delegate's reference instead of the delegate. This also makes the
   parser one node smaller, potentially."

  | p suture |
  suture := cache at: aParser.
  p := suture reference.
  (p notNil and: [p isDelegate and: [(self formsACycle: p) not]]) ifTrue: [
    ^ cache at: p parser].
  ^ suture.

So we walk the parser pre-order, stashing Sutures in our map as we go. We compact the parsers in a post-order fashion (so that composite parsers always (except for cycles) have their subparsers compacted). During compaction we update the compaction mapping ((cache at: aParser) reference: bigCaseStatement in #compact:‘s definition above). And finally, we pull a great big switcheroo, and make the Sutures vanish, leaving a pristine, cyclic parser:

SutureCompacter >> tieSutures: aParser
  | sutures references |
  sutures := cache values.
  references := sutures collect: #reference.
  "Put the pointed-to objects in the cache rather than the sutures."
  sutures do: [:s | cache at: s put: s reference].
  "Magically make everything that points to a Suture point to the Suture's
   reference, tying the cyclic knots."

  sutures elementsForwardIdentityTo: references.

So after all that, what was the point? A neater API! Now we don’t have to add artificialities to describe cyclic parsers;

| ones |
"This line just shuts the Compiler up, which would otherwise
 moan about the undefined use of ones in the second line."

ones := nil.
ones := DerivingParser emptyString or: ($1 asParser then: [ones] asParser).
by
Frank Shearar
on
30/12/12
 
 


seven − 2 =

2000-14 LShift Ltd, 1st Floor, Hoxton Point, 6 Rufus Street, London, N1 6PE, UK+44 (0)20 7729 7060   Contact us