Chapter 5Ambiguity

Ambiguity is not an error. The fact that Invisible XML grammars allow ambiguity is a feature. It’s also generally observed as the combination of a grammar and an input. Consider this grammar:

  |product: dessert-topping ; floor-wax .
  |dessert-topping: "custard" ; "cream" ; "Shimmer" .
  |floor-wax: "paste-wax"; "quick-shine"; "Shimmer" .
Example 5.1SNL™ Shimmer Sketch

(If you aren’t familiar with the Saturday Night Live “Shimmer Floor Wax” sketch, now would be the time to go search the web.)

Parsed against the input “custard”, it says dessert topping:

  |$ coffeepot -pp -g:examples/ambig01.ixml custard
  |<product>
  |   <dessert-topping>custard</dessert-topping>
  |</product>

Parsed against the input “paste-wax”, it says floor wax:

  |$ coffeepot -pp -g:examples/ambig01.ixml paste-wax
  |<product>
  |   <floor-wax>paste-wax</floor-wax>
  |</product>

Parsed against “Shimmer”, it says:

  |$ coffeepot -pp -g:examples/ambig01.ixml Shimmer
  |There are 2 possible parses.
  |<product xmlns:ixml="http://invisiblexml.org/NS" ixml:state="ambiguous">
  |   <floor-wax>Shimmer</floor-wax>
  |</product>

This is an example of an essential ambiguity; you can’t “fix” this grammar. But let’s dig a little deeper anyway. For a small number of parses, one way to investigate the ambiguity is to simply list them all with --parse-count:all:

  |$ coffeepot -pp -g:examples/ambig01.ixml --parse-count:all Shimmer
  |<ixml parses='2' totalParses='2'>
  |<product xmlns:ixml="http://invisiblexml.org/NS" ixml:state="ambiguous">
  |   <floor-wax>Shimmer</floor-wax>
  |</product><product xmlns:ixml="http://invisiblexml.org/NS" ixml:state="ambiguous">
  |   <dessert-topping>Shimmer</dessert-topping>
  |</product></ixml>

Alternatively, we can ask the parser to describe the ambiguity with --describe-ambiguity:

  |$ coffeepot -pp -g:examples/ambig01.ixml --describe-ambiguity --no-output Shimmer
  |Found 2 possible parses.
  |Ambiguity:
  |At 
  |	X dessert-topping «0-7» => 'S', 'h', 'i', 'm', 'm', 'e', 'r'
  |	  floor-wax «0-7» => 'S', 'h', 'i', 'm', 'm', 'e', 'r'

This indicates that the characters from 0 to 7 in the input can be matched as “dessert-topping” or “floor-wax”. This is another case where the forest graph can be useful.

Figure 5.1The parse forest for “Shimmer”

It is possible for grammars to be infinitely ambigous. Consider this trivial grammar:

  |expr: expr ; 'a' .
Example 5.2An infinitely ambiguous grammar

There’s no practical way for coffeepot to enumerate infinitely many parses, so it essentially ignores those edges in the graph. Parsing “a” yields:

  |$ coffeepot -pp -g:examples/ambig02.ixml a
  |There are 2 possible parses.
  |<expr xmlns:ixml="http://invisiblexml.org/NS" ixml:state="ambiguous">
  |   <expr>a</expr>
  |</expr>

There are 2 parses because that’s what can be rendered. A description of the ambiguity, reveals that there are infinitely many parses:

  |$ coffeepot -pp -g:examples/ambig02.ixml --describe-ambiguity --no-output a
  |Found 2 possible parses (of infinitely many).
  |Infinite ambiguity:
  |At
  |	X 'a', 0, 1
  |	  expr «0-1» => 'a'

This is also evident in the graph:

Figure 5.2The forest for an infinitely ambiguous parse

That loop is the source of infinite ambiguity. The parses that coffeepot will enumerate are:

  |$ coffeepot -pp -g:examples/ambig02.ixml --parse-count:all a
  |<ixml parses='2' totalParses='2'>
  |<expr xmlns:ixml="http://invisiblexml.org/NS" ixml:state="ambiguous">
  |   <expr>a</expr>
  |</expr>
  |<expr xmlns:ixml="http://invisiblexml.org/NS" ixml:state="ambiguous">a</expr>
  |</ixml>

If you’re trying to eliminate ambiguity from a grammar that you think should be unambiguous, look for multiple ways to match “nothing”. For example, if you have a nonterminal that matches zero or more whitespace characters, make sure it isn’t possible for it to match in two different places.

5.1Describing ambiguity

Ambiguity can be described in one of three ways: using a compact, text format; using a minimal XML format; or using the full XML format that will be used when choosing alternatives with a function or XPath expression.

In the examples below, the parse output is from the numbers grammar and input introduced in Chapter 6, Choosing among alternatives.

5.1.1Textual ambiguity descriptions

The text format identifies the location in the result tree with a simple XPath expression and then lists each option in the form “nonterminal => right-hand-side”. An “X” marks the selected alternative.

For example:

  |At /number-list[1]/number[3]
  |	X hex «9-11» => $2_hex-digit-plus
  |	  decimal «9-11» => $3_decimal-digit-plus

This indicates that at the third “number” element, there are two choices “hex” or “decimal”. Both span input tokens 9-11. The nonterminals on the right hand side are from the BNF version of the grammar. The “hex” alternative has been selected.

5.1.2XML ambiguity descriptions

The XML format presents two small XML fragments that represent the parse trees under consideration. (Within the XML, some nonterminal names are rewritten to make them valid XML element names. It is possible, though probably unlikely, that the rewritten names could be the same as actual nonterminals in the grammar.)

For example:

 1 |At /number-list[1]/number[3]
   |Alternative 1 of 2 (selected):
   |<number mark="-" from="9-11">
   |   <hex from="9-11">
 5 |      <_2_hex-digit-plus mark="-">…</_2_hex-digit-plus>
   |   </hex>
   |</number>
   |Alternative 2 of 2:
   |<number mark="-" from="9-11">
10 |   <decimal from="9-11">
   |      <_3_decimal-digit-plus mark="-">…</_3_decimal-digit-plus>
   |   </decimal>
   |</number>

As before, this indicates that at the third “number” element, there are two choices “hex” or “decimal” and the first choice has been selected.

5.1.3API XML ambiguity descriptions

The “API” XML format is designed to provide as much information as possible such that it’s easy to write XPath expressions to analyze it. The result is somewhat less easy to read.

These are the XML alternatives that function libraries and XPath expressions are evaluated against. Here, the entire document is shown, but the actual context item in each case is the element with the alternative attribute.

For example:

 1 |At /number-list[1]/number[3]
   |Alternative 1 of 2 (selected):
   |<number-list xmlns:a="https://nineml.org/ns/describe-ambiguity"
   |              a:version="1.0"
 5 |              name="number-list"
   |              mark="^">
   |   <number name="number"/>
   |   <number name="number"/>
   |   <number alternative="1" name="number" from="9" to="11" mark="-">
10 |      <hex name="hex" from="9" to="11" mark="^">
   |         <a:nonterminal name="$2_hex-digit-plus" mark="-">…</a:nonterminal>
   |      </hex>
   |   </number>
   |</number-list>
15 |Alternative 2 of 2:
   |<number-list xmlns:a="https://nineml.org/ns/describe-ambiguity"
   |              a:version="1.0"
   |              name="number-list"
   |              mark="^">
20 |   <number name="number"/>
   |   <number name="number"/>
   |   <number alternative="2" name="number" from="9" to="11" mark="-">
   |      <decimal name="decimal" from="9" to="11" mark="^">
   |         <a:nonterminal name="$3_decimal-digit-plus" mark="-">…</a:nonterminal>
25 |      </decimal>
   |   </number>
   |</number-list>

As before, this indicates that at the third “number” element, there are two choices “hex” or “decimal” and the first choice has been selected.