Lingua.NET is a parser generator that uses code-based grammar definitions. Parser generators typically read a text-based grammar specification and emit source code that is subsequently compiled. Lingua.NET uses reflection to extract the grammar from an assembly and create the corresponding parser.

A grammar consits of three primary elements:
  • Terminals or tokens. These define the individual "words" to be recognized by the parser. For a grammar that specifies a programming language such as C#, terminals would include such things as:
    • Literals such as strings and numeric constants.
    • Symbols such as variables and procedure names.
    • Operators such as + and -.
    • Punctuation such as { and }.
  • Nonterminals. These represent the allowed "phrases" to be recognized by the the parser.
  • Rules or productions. These specify how nonterminals are constructed from other nonterminals and terminals. For example:
    • expression ::= numeric_constant;
    • expression ::= boolean_expression;
    • boolean_expression ::= expression boolean_operator expression;
    • boolean_operator ::= op_addition;
    • boolean_operator ::= op_subtraction;

Specifying a Grammar

Terminals, nonterminals and rules are defined by class and static methods as illustrated below.


public class Number : Terminal
A terminal is a class that:
  • Inherits directly or indirectly from Terminal.
  • Has a public default constructor.
  • Is adorned with a TeriminalAttribute containing the regular expression that defines the terminal.


public class BooleanOperator : Nonterminal
A nonterminal is a class that:
  • Inherits directly or indirectly from Nonterminal.
  • Has a public default constructor.


public class BooleanOperator : Nonterminal
    public static void Rule(BooleanOperator result, OperatorAddition op)
      // Code
A rule is a static method that:
  • Contains at least one parameter.
  • Specifies a nonterminal as its first parameter.
  • Specifies either terminals or nonterminals for all remaining parameters.

Using Lingua.NET

Once a grammar has been defined, use Lingua.NET to generate a terminal reader and parser.

Load Grammar

Construct a Grammar object and read in the grammar defined within the specified assembly.
Assembly assembly = Assembly.GetAssembly(typeof(App));

Grammar grammar = new Grammar();

Generate Parser and Terminal Reader

The grammar is used to construct a terminal reader and parser.
ITerminalReaderGenerator terminalReaderGenerator = new TerminalReaderGenerator();
TerminalReaderGeneratorResult terminalReaderGeneratorResult = terminalReaderGenerator.GenerateTerminalReader(grammar);
ITerminalReader terminalReader = terminalReaderGeneratorResult.TerminalReader;

IParserGenerator parserGenerator = new ParserGenerator();
ParserGeneratorResult parserGeneratorResult = parserGenerator.GenerateParser(grammar);
IParser parser = parserGeneratorResult.Parser;

Open Terminal Reader and Parse Terminal Stream

The terminal reader and parser are used to process the desired text.
Start result = parser.Parse(terminalReader);

Last edited Jan 7, 2010 at 5:00 AM by rtodd, version 9


No comments yet.