Objective-C Parser Generation via Grammars

Basic Grammar Syntax

ParseKit allows users to build parsers for custom languages from a declarative, BNF-style grammar without writing any code (well, ok.. a single line of code). Under the hood, grammar support is implemented using the ParseKit Objective-C API, so the grammar syntax closely mirrors the features of the Objective-C API.

The grammar below describes a simple toy language called Cold Beer and will serve as a quick introduction to the ParseKit grammar syntax. The rules of the Cold Beer language are as follows. The language consists of a sequence of one or more sentences beginning with the word «cold» followed by a repetition of either «cold» or «freezing» followed by «beer» and terminated by the symbol «.».

For example, each of the following lines are valid instances of the Cold Beer language (as is the example as a whole):

    cold cold cold freezing cold freezing cold beer.
    cold cold freezing cold beer.
    cold freezing beer.
    cold beer.

The following lines are not valid Cold Beer statements:

    freezing cold beer.
    cold freezing beer
    beer.

Here is a complete ParseKit grammar for the Cold Beer language.

    @start = sentence+;
    sentence = adjectives 'beer' '.';
    adjectives = cold adjective*;
    adjective = cold | freezing;
    cold = 'cold';
    freezing = 'freezing';

As shown above, the ParseKit grammar syntax consists of individual language production declarations separated by «;». Whitespace is ignored, so the productions can be formatted liberally with whitespace as the programmer prefers. Comments are also allowed and resemble the comment style of Objective-C. So a commented Cold Beer grammar may appear as:

    /*
        A Grammar for the Cold Beer Language
        by Todd Ditchendorf
    */
    @start = sentence+;     // outermost production
    sentence = adjectives 'beer' '.';
    adjectives = cold adjective*;
    adjective = cold | 'freezing';
    cold = 'cold';
    freezing = 'freezing';

Individual Grammar Production Syntax

Every ParseKit grammar must contain one and only one production named @start. This will be the highest-level or outermost production rule in the language. For Cold Beer, the outermost production is:

    @start = sentence+;

Which states that the outermost production of this language consists of a sequence of one or more («+») instances of the sentence production.

    sentence = adjectives 'beer' '.';

The sentence production states that sentences are a sequence of the adjective production followed by the literal strings beer and .

    adjectives = cold adjective*;

In turn, adjectives is a sequence of a single instance of the cold production followed by a repetition («*» read as ‘zero or more’) of the adjective production.

    adjective = cold | freezing;
    cold = 'cold';
    freezing = 'freezing';

The adjective production is an alternation of either an instance of the cold or the freezing production. The cold production is the literal string cold and freezing the literal string freezing.

Grouping

A language may be expressed in many different, yet equivalent grammars. Productions may be referenced in any order (even before they are defined) and grouped using parentheses («(» and «)»).

For example, the Cold Beer language could also be represented by the following grammar:

    @start = ('cold' ('cold' | 'freezing')* 'beer' '.')+;

Instantiating Grammar Parsers in Objective-C

Create an Objective-C PKParser object by providing the grammar as an NSString and an assembler object (in this example, self).

  NSString *g = ... // fetch your grammar from a file on disk
  PKParser *parser = nil;
  parser = [[PKParserFactory factory] parserFromGrammar:g assembler:self];
  NSString *s = @"cold freezing cold beer.";
  [parser parse:s];

The provided assembler object will receive callbacks whenever one of the language grammar productions is matched — if the callback method is implemented in the assembler object.

For example, an assembler for the Cold Beer language grammar above will receive the following callbacks if implemented:

    - (void)didMatchSentence:(PKAssembly *)a;
    - (void)didMatchAdjectives:(PKAssembly *)a;
    - (void)didMatchAdjective:(PKAssembly *)a;
    - (void)didMatchCold:(PKAssembly *)a;
    - (void)didMatchFreezing:(PKAssembly *)a;

The PKAssembly argument will have the most-recently matched tokens on the top of its stack.

To prevent leaks when releasing a PKParser created via PKParserFactory, one final step is required:

          PKParser *parser = nil;
          parser = [[PKParserFactory factory] parserFromGrammar:g assembler:self];
          ... // do parsing
          // when you are done with the parser, this function must be called before releasing
          PKReleaseSubparserTree(parser);
          [p release]

Most non-trivial language grammars define circular relationships in their production rules. The call to PKReleaseSubparserTree() is required to prevent Objective-C retain cycle leaks where the PKParser objects representing different rules in your grammar have strong references to one another.