technology from back to front

Three approaches to ambiguous grammars

We have many tools in our parsing toolbox. Today let’s look at how three different parsing techniques handle ambiguity caused by choice.

First, yacc. yacc handles ambiguities in your grammar with, if you’re lucky, complaints about shift/reduce conflicts. If you’re not so lucky you’ll get reduce/reduce conflicts. Either way, you will spend ages trying to figure out (a) how the conflict arose and (b) how to beat your grammar into the shape that yacc demands. In other words, yacc forces you to resolve any ambiguities yourself, through the application of pain. Life is too short.

Second, PEG parsers. We’ve seen these before on this blog. PEGs are, by definition, unambiguous. “By definition” because PEGs don’t use BNF-style “or” constructs, but prioritised choice. That means that given a / b, either a will succeed or if it fails then b will get a chance at parsing the stream.

The problem is that the grammar you define might not be the grammar you meant to define. Suppose you have a grammar expr = "foo" | "foobar" | "bar" and you parse foobar. You might think that the result of that parse would be the greedy match, "foobar". Alas, not. "foo" wil succeed, leaving "bar" on the stream. The "foobar" rule will not have a chance to run. So the general rule is to make the greediest rules run first. In a highly factored grammar it might not be obvious how to do this!

And finally, the new kid on the block. When you parse with derivatives, you create parse trees for all possible parse trees. Given the toy grammar above, you will thus have {"foo", "foobar"} as the set of possible parses. So again, there’s ambiguity, and you have to solve it yourself. Or not, of course: you might want all possible parses because you really are parsing something ambiguous, or maybe you don’t care and just want the first parse.

by
Frank Shearar
on
31/01/13
  1. Ceri Storey
    on 04/02/13 at 6:22 pm

    Here’s another one: Apparently the GLL parsing algorithm can return forests, much like the derivative parsing algorithm does. Daniel Spiewak describes this in his article “Unveiling the Mysteries of GLL” (sadly, I don’t think he finished the series), and there’s a more in-depth article by Vegard Øye on implementing General Parser Combinators in Racket. Interesting stuff.

 
 


eight × = 16

2000-14 LShift Ltd, 1st Floor, Hoxton Point, 6 Rufus Street, London, N1 6PE, UK+44 (0)20 7729 7060   Contact us