Introduction

Lingua.NET is a parser generator that uses code-based grammar definitions. Parser generators typically read a text-based grammar specification and emit source code that is subsequently compiled. Lingua.NET uses reflection to extract the grammar from an assembly and create the corresponding parser.

A grammar consits of three primary elements:
  • Terminals or tokens. These define the individual "words" to be recognized by the parser. For a grammar that specifies a programming language such as C#, terminals would include such things as:
    • Literals such as strings and numeric constants.
    • Symbols such as variables and procedure names.
    • Operators such as + and -.
    • Punctuation such as { and }.
  • Nonterminals. These represent the allowed "phrases" to be recognized by the the parser.
  • Rules or productions. These specify how nonterminals are constructed from other nonterminals and terminals. For example:
    • expression ::= numeric_constant;
    • expression ::= boolean_expression;
    • boolean_expression ::= expression boolean_operator expression;
    • boolean_operator ::= op_addition;
    • boolean_operator ::= op_subtraction;

Specifying a Grammar

Terminals, nonterminals and rules are defined by class and static methods as illustrated below.

Terminals

[Terminal(@"\d+")]
public class Number : Terminal
{
}
A terminal is a class that:
  • Inherits directly or indirectly from Terminal.
  • Has a public default constructor.
  • Is adorned with a TeriminalAttribute containing the regular expression that defines the terminal.

Nonterminals

public class BooleanOperator : Nonterminal
{
}
A nonterminal is a class that:
  • Inherits directly or indirectly from Nonterminal.
  • Has a public default constructor.

Rules

public class BooleanOperator : Nonterminal
{
    public static void Rule(BooleanOperator result, OperatorAddition op)
    {
      // Code
    }
}
A rule is a static method that:
  • Contains at least one parameter.
  • Specifies a nonterminal as its first parameter.
  • Specifies either terminals or nonterminals for all remaining parameters.

Using Lingua.NET

Once a grammar has been defined, use Lingua.NET to generate a terminal reader and parser.

Load Grammar

Construct a Grammar object and read in the grammar defined within the specified assembly.
Assembly assembly = Assembly.GetAssembly(typeof(App));

Grammar grammar = new Grammar();
grammar.Load(assembly);
grammar.LoadRules(assembly);
grammar.Resolve();

Generate Parser and Terminal Reader

The grammar is used to construct a terminal reader and parser.
ITerminalReaderGenerator terminalReaderGenerator = new TerminalReaderGenerator();
TerminalReaderGeneratorResult terminalReaderGeneratorResult = terminalReaderGenerator.GenerateTerminalReader(grammar);
ITerminalReader terminalReader = terminalReaderGeneratorResult.TerminalReader;

IParserGenerator parserGenerator = new ParserGenerator();
ParserGeneratorResult parserGeneratorResult = parserGenerator.GenerateParser(grammar);
IParser parser = parserGeneratorResult.Parser;

Open Terminal Reader and Parse Terminal Stream

The terminal reader and parser are used to process the desired text.
terminalReader.Open(txtExpression.Text);
Start result = parser.Parse(terminalReader);

Last edited Jan 7, 2010 at 4:00 AM by rtodd, version 9

Comments

No comments yet.