Chapter 9Pragmas

It’s possible to influence the behavior of the processor by placing pragmas in your grammar.

Experimental

Pragmas are a separate feature; they are not part of Invisible XML 1.0. As of 4 September 2022, the pragma syntax accepted by CoffeeFilter (and CoffeePot) has been updated to the grammar described in Designing for change: Pragmas in Invisible XML as an extensibility mechanism presented at Balisage, 2022.

If you run CoffeePot with the --pedantic option, you cannot use pragmas.

A pragma begins with “{[” and is followed by a pragma name, pragma data (which may be empty), and closes with “]}”. The pragma name is a shortcut for a URI which provides the “real” identity of the pragma. This mechanism leverages URI space to achieve distributed extensibility.

The mapping from names to URIs is done with the “pragma” pragma at the top of your grammar. This, for example, declares the name “nineml” as the pragma identified by the URI “https://nineml.org/ns/pragma/”:

  |{[+pragma nineml "https://nineml.org/ns/pragma/"]}

CoffeePot ignores any pragmas it does not recognize. The rest of this document assumes that you have declared the pragma name “nineml” as shown above. You must do this in every grammar file where you use pragmas.

Pragmas can be associated with the entire grammar or with a rule, a nonterminal symbol, or a terminal symbol:

  1. A pragma placed before a symbol applies to the symbol that follows it:

      |rule: {[pragma applies to “A”]} A,
      |      {[pragma applies to “b”]} 'b'.
    
  2. A pragma placed before a rule, applies to the rule that follows it:

      |{[pragma applies to “rule”]}
      |rule: {[pragma applies to “A”]} A,
      |      {[pragma applies to “b”]} 'b'.
    
  3. To apply a pragma to the entire grammar, it must be in the prolog.

    1 |{[+pragma applies to whole grammar]}
      | 
      |{[pragma applies to “rule”]}
      |rule: {[pragma applies to “A”]} A,
    5 |      {[pragma applies to “b”]} 'b'.
    

More than one pragma can appear at any of those locations:

 1 |{[+pragma applies to whole grammar]}
   |{[+second pragma applies to whole grammar ]}
   | 
   |{[pragma applies to “rule”]}
 5 |{[second pragma applies to “rule”]}
   |rule:
   |   {[pragma applies to “A”]}
   |   {[second pragma applies to “A”]} A,
   |   {[pragma applies to “b”]}
10 |   {[second pragma applies to “b”]} 'b'.

If a pragma is not recognized, or does not apply, it is ignored. CoffeePot will generate debug-level log messages to alert you to pragmas that it is ignoring.

9.1Grammar pragmas

There following pragmas apply to a grammar as a whole.

9.1.1csv-columns

Identifies the columns to be output when CSV output is selected.

Usage:

  |{[+nineml csv-columns list,of,names]}

Ordinarily, CSV formatted output includes all the columns in (roughly) the order they occur in the XML. This pragma allows you to list the columns you want output and the order in which you want them output.

If a column requested does not exist in the document, it is ignored. An empty column is not produced.

9.1.2import

Allows one grammar to import another.

Usage:

  |{[+nineml import "grammar-uri"]}

In principle, this pragma allows you to combine grammars. This feature is experimental and no coherent semantics have yet been established.

9.1.3ns

Declares the default namespace for the output XML.

Usage:

  |{[+nineml ns "namespace-uri"]}

9.1.4record-end

The record-end pragma enables record-oriented processing by default. It’s value is the regular expression that marks record ends. Unlike the other pragmas, this one has a different URI binding:

Usage:

  |{[+pragma opt "https://nineml.org/ns/pragma/options/"]}
  |{[+opt record-end "\n([^ ])"]}

9.1.5record-start

The record-start pragma enables record-oriented processing by default. It’s value is the regular expression that marks record starts. Unlike the other pragmas, this one has a different URI binding:

Usage:

  |{[+pragma opt "https://nineml.org/ns/pragma/options/"]}
  |{[+opt record-start "([^\\])\n"]}

9.2Rule pragmas

There following pragmas apply to a rules.

9.2.1csv-heading

Specify the heading title to use in CSV output if the nonterminal defined by this is used as the value of a column.

Usage:

  |{[nineml xmlns "Heading Title"]}

9.2.2discard-empty

If the nonterminal defined by this rule is empty, it will be discarded (not serialized at all).

Usage:

  |{[nineml discard empty]}

9.3Symbol pragmas

The following pragmas that apply to a symbols.

9.3.1rename

This pragma changes the name used when the element is serialized. It applies only to nonterminals.

Usage:

  |{[nineml rename newname]}

9.3.2rewrite

This pragma changes text output when the terminal. It applies only to terminals.

Usage:

  |{[nineml rewrite "new literal"]}

This pragma was invented before Invisible XML added insertions. The same effects can be obtained with insertions (by suppressing one terminal and inserting another). That is the preferred approach.

9.3.3priority

This pragma associates a priority with a nonterminal.

Usage:

  |{[nineml priority 1.5]}

When an ambiguous parse is being serialized, there will be places in the output where a choice must be made between two or more alternatives. A priority can be used to control the selection. The nonterminal with the highest priority will be selected. If there are no priorities, or if several nonterminals have the same priority, no guarantees are made about which alternative will be selected. The default priority for all nonterminals is 0.

Consider the following grammar:

1 | number: hex | decimal .
  | hex: hex-digit+ .
  | decimal: decimal-digit+ .
  |-hex-digit: ["0"-"9" | "a"-"f" | "A"-"F" ] .
5 |-decimal-digit: ["0"-"9" ] .

It parses numbers in either hexadecimal or decimal. In the case of a number like “42”, the parse is ambiguous, it matches either hex or decimal:

  |$ coffeepot --pretty-print -g:hex.ixml 42
  |Found 2 possible parses.
  |<number xmlns:ixml="http://invisiblexml.org/NS" ixml:state="ambiguous">
  |   <hex>42<hex>
  |</number>

You can give decimal a higher priority:

1 |{[+pragma nineml "https://nineml.org/ns/pragma/"]}
  | 
  | number: hex | {[nineml priority 2]} decimal .
  | hex: hex-digit+ .
5 | decimal: decimal-digit+ .
  |-hex-digit: ["0"-"9" | "a"-"f" | "A"-"F" ] .
  |-decimal-digit: ["0"-"9" ] .

Now decimal will be selected:

  |$ coffeepot --pretty-print -g:hex.ixml 42
  |Found 2 possible parses.
  |<number xmlns:ixml="http://invisiblexml.org/NS" ixml:state="ambiguous">
  |   <decimal>42</decimal>
  |</number>

The parse is still considered ambiguous because there were two possible choices.

Note

If a grammar is infinitely ambiguous, the same part of the parse may be serialized more than once. When this happens, the selection is always between the remaining alternatives.