Chapter 5. Ambiguity
Ambiguity is not an error. The fact that Invisible XML grammars allow ambiguity is a feature. It’s also generally observed as the combination of a grammar and an input. Consider this grammar:
|product: dessert-topping ; floor-wax .
|dessert-topping: "custard" ; "cream" ; "Shimmer" .
|floor-wax: "paste-wax"; "quick-shine"; "Shimmer" .
(If you aren’t familiar with the Saturday Night Live “Shimmer Floor Wax” sketch, now would be the time to go search the web.)
Parsed against the input “custard”, it says dessert topping:
|$ coffeepot -pp -g:examples/ambig01.ixml custard
|<product>
|<dessert-topping>custard</dessert-topping>
|</product>
Parsed against the input “paste-wax”, it says floor wax:
|$ coffeepot -pp -g:examples/ambig01.ixml paste-wax
|<product>
|<floor-wax>paste-wax</floor-wax>
|</product>
Parsed against “Shimmer”, it says:
|$ coffeepot -pp -g:examples/ambig01.ixml Shimmer
|There are 2 possible parses.
|<product xmlns:ixml="http://invisiblexml.org/NS" ixml:state="ambiguous">
|<floor-wax>Shimmer</floor-wax>
|</product>
This is an example of an essential ambiguity; you can’t “fix”
this grammar. But let’s dig a little deeper anyway. For a small number
of parses, one way to investigate the ambiguity is to simply list them
all with --parse-count:all
:
|$ coffeepot -pp -g:examples/ambig01.ixml --parse-count:all Shimmer
|<ixml parses='2' totalParses='2'>
|<product xmlns:ixml="http://invisiblexml.org/NS" ixml:state="ambiguous">
|<floor-wax>Shimmer</floor-wax>
|</product><product xmlns:ixml="http://invisiblexml.org/NS" ixml:state="ambiguous">
|<dessert-topping>Shimmer</dessert-topping>
|</product></ixml>
Alternatively, we can ask the parser to describe the ambiguity with
--describe-ambiguity
:
|$ coffeepot -pp -g:examples/ambig01.ixml --describe-ambiguity --no-output Shimmer
|Found 2 possible parses.
|Ambiguity:
|At
|X dessert-topping «0-7» => 'S', 'h', 'i', 'm', 'm', 'e', 'r'
|floor-wax «0-7» => 'S', 'h', 'i', 'm', 'm', 'e', 'r'
This indicates that the characters from 0 to 7 in the input can be matched as “dessert-topping” or “floor-wax”. This is another case where the forest graph can be useful.
It is possible for grammars to be infinitely ambigous. Consider this trivial grammar:
|expr: expr ; 'a' .
There’s no practical way for coffeepot to enumerate infinitely many parses, so it essentially ignores those edges in the graph. Parsing “a” yields:
|$ coffeepot -pp -g:examples/ambig02.ixml a
|There are 2 possible parses.
|<expr xmlns:ixml="http://invisiblexml.org/NS" ixml:state="ambiguous">
|<expr>a</expr>
|</expr>
There are 2 parses because that’s what can be rendered. A description of the ambiguity, reveals that there are infinitely many parses:
|$ coffeepot -pp -g:examples/ambig02.ixml --describe-ambiguity --no-output a
|Found 2 possible parses (of infinitely many).
|Infinite ambiguity:
|At
|X 'a', 0, 1
|expr «0-1» => 'a'
This is also evident in the graph:
That loop is the source of infinite ambiguity. The parses that coffeepot will enumerate are:
|$ coffeepot -pp -g:examples/ambig02.ixml --parse-count:all a
|<ixml parses='2' totalParses='2'>
|<expr xmlns:ixml="http://invisiblexml.org/NS" ixml:state="ambiguous">
|<expr>a</expr>
|</expr>
|<expr xmlns:ixml="http://invisiblexml.org/NS" ixml:state="ambiguous">a</expr>
|</ixml>
If you’re trying to eliminate ambiguity from a grammar that you think should be unambiguous, look for multiple ways to match “nothing”. For example, if you have a nonterminal that matches zero or more whitespace characters, make sure it isn’t possible for it to match in two different places.
5.1. Describing ambiguity
Ambiguity can be described in one of three ways: using a compact, text format; using a minimal XML format; or using the full XML format that will be used when choosing alternatives with a function or XPath expression.
In the examples below, the parse output is from the numbers grammar and input introduced in Chapter 6, Choosing among alternatives.
5.1.1. Textual ambiguity descriptions
The text format identifies the location in the result tree with a simple XPath expression and then lists each option in the form “nonterminal => right-hand-side”. An “X” marks the selected alternative.
For example:
|At /number-list[1]/number[3]
|X hex «9-11» => $2_hex-digit-plus
|decimal «9-11» => $3_decimal-digit-plus
This indicates that at the third “number” element, there are two choices “hex” or “decimal”. Both span input tokens 9-11. The nonterminals on the right hand side are from the BNF version of the grammar. The “hex” alternative has been selected.
5.1.2. XML ambiguity descriptions
The XML format presents two small XML fragments that represent the parse trees under consideration. (Within the XML, some nonterminal names are rewritten to make them valid XML element names. It is possible, though probably unlikely, that the rewritten names could be the same as actual nonterminals in the grammar.)
For example:
1 |At /number-list[1]/number[3]
|Alternative 1 of 2 (selected):
|<number mark="-" from="9-11">
|<hex from="9-11">
5 |<_2_hex-digit-plus mark="-">…</_2_hex-digit-plus>
|</hex>
|</number>
|Alternative 2 of 2:
|<number mark="-" from="9-11">
10 |<decimal from="9-11">
|<_3_decimal-digit-plus mark="-">…</_3_decimal-digit-plus>
|</decimal>
|</number>
As before, this indicates that at the third “number” element, there are two choices “hex” or “decimal” and the first choice has been selected.
5.1.3. API XML ambiguity descriptions
The “API” XML format is designed to provide as much information as possible such that it’s easy to write XPath expressions to analyze it. The result is somewhat less easy to read.
These are the XML alternatives that function libraries and XPath expressions are
evaluated against. Here, the entire document is shown, but the actual context item
in each case is the element with the alternative
attribute.
For example:
1 |At /number-list[1]/number[3]
|Alternative 1 of 2 (selected):
|<number-list xmlns:a="https://nineml.org/ns/describe-ambiguity"
|a:version="1.0"
5 |name="number-list"
|mark="^">
|<number name="number"/>
|<number name="number"/>
|<number alternative="1" name="number" from="9" to="11" mark="-">
10 |<hex name="hex" from="9" to="11" mark="^">
|<a:nonterminal name="$2_hex-digit-plus" mark="-">…</a:nonterminal>
|</hex>
|</number>
|</number-list>
15 |Alternative 2 of 2:
|<number-list xmlns:a="https://nineml.org/ns/describe-ambiguity"
|a:version="1.0"
|name="number-list"
|mark="^">
20 |<number name="number"/>
|<number name="number"/>
|<number alternative="2" name="number" from="9" to="11" mark="-">
|<decimal name="decimal" from="9" to="11" mark="^">
|<a:nonterminal name="$3_decimal-digit-plus" mark="-">…</a:nonterminal>
25 |</decimal>
|</number>
|</number-list>
As before, this indicates that at the third “number” element, there are two choices “hex” or “decimal” and the first choice has been selected.