Ned’s Prescriptive Prefix Pushing Ploy

Thursday 14 October 2004This is 20 years old. Be careful.

Most generated parsers don’t let you select the target production at run time: they are always looking for the same syntax. But what if you want to call your parser in different ways, for example sometimes expecting a statement, and sometimes just an expression? Here’s a trick I came up with to use any parser as a switchable-start parser.

Let’s say you’re using a parser generator (such as Bison, Lemon, or PyGgy). You’ve defined a grammar for a little language with statements:

stmt ->
      ID EQ expr SEMI
    | PRINT expr SEMI
    | IF expr STMT
    | IF expr STMT ELSE stmt
    | WHILE expr STMT
    ;

expr ->
      LPAREN expr RPAREN
    | expr op expr
    | NUMBER
    | ID
    ;

op -> PLUS | MINUS | STAR | SLASH ;

(This is a very small language, expressed in PyGgy’s syntax.) The parser generated by this file will parse a statement. Let’s say you want to now call the parser to parse for just an expression. How to do it? PyGgy doesn’t have the option of specifying the start production at run time: it always starts with the first non-terminal.

Here’s a trick I came up with: Extend the grammar with a special token that indicates what to parse for. Then at run time, “tell” the parser what it’s looking for by pushing that token into the parser before the real text.

We add a new rule to the grammar:

main ->
      EXPECT_STMT stmt
    | EXPECT_EXPR expr
    ;

When we start the parser, if we’re looking for a statement, we push EXPECT_STMT as the first token in the stream. If we’re looking for an expression, we push EXPECT_EXPR. The first token switches the parser onto the proper track, and the parser takes care of the rest. Of course, this can be extended to any number of start states.

I wanted to come up with a catch alliterative name for this technique, like Duff’s Device. But there’s aren’t many useful words starting with N or B, so I’m going with Ned’s Prescriptive Prefix Pushing Ploy.

Comments

[gravatar]
Ned - don't know if you've seen Antlr - but that allows you to select the starting production, as all productions are methods on a generated class. To use your example, you'd have something like:

class SomeParser
{
void stmt();
void expr();
void op();
};

No Python, though - only C++, Java or C#...
[gravatar]
I have seen (and even used!) ANTLR. In fact, I knew one or more parser generators let you select the start production, but I couldn't remember which it was. It was probably ANTLR I was thinking of.
[gravatar]
How about calling it "Ned's Nicety"? Or maybe "Batchelder's Bait-and-Switch"? :)
[gravatar]
... Ned's Knob? ... maybe not.
[gravatar]
Here are a few:

Ned's Notion
Batchelder's Band-aid
Batchelder's Brainchild
Batchelder's Breakthrough
Batchelder's Blackbox

--bob

Add a comment:

Ignore this:
Leave this empty:
Name is required. Either email or web are required. Email won't be displayed and I won't spam you. Your web site won't be indexed by search engines.
Don't put anything here:
Leave this empty:
Comment text is Markdown.