Spirit parser framework

Friday 3 September 2004This is 20 years old. Be careful.

If you want to generate a parser, there are lots of choices of software to help you. Most follow the classic yacc model: you write a grammar in a dedicated grammar language, run it through a parser generator, and end up with executable code for your parser. Newer tools run the same way: Bison, ANTLR, and Lemon.

So I was interested to find the Spirit Parser Framework, which employs a completely different technology. Grammars are written in highly stylized C++, and executed directly. The grammar syntax is unconventional, because it’s all based on C++ operator overloading. Here’s part of a SQL parser:

program =  +(query) ;

query
= longest_d[ select_from_clause
| select_from_where_clause ] >> SEMI;

select_from_clause
=  select_clause >> from_clause ;

select_from_where_clause
=  select_from_clause >> where_clause ;

select_clause
= SELECT >> !(DISTINCT) >> ( STAR | var_list_clause ) ;

var_list_clause
= list_p( varname >> !alias_clause, COMMA ) ;

from_clause
=  FROM >> list_p( table_name >> !alias_clause, COMMA ) ;

Significant template wizardry is at work behind the scenes, I don’t claim to understand it.

Comments

[gravatar]
My gut reaction to a parser generator that uses C++ templates and operator overloading? Clever but scary.

You'll get a mixture of compile-time and runtime errors when your grammar has problems. Getting a C++ compiler error for a parser grammar error isn't going to be pretty. That's not to say that this is an entirely bad idea, just that it may be, as they say "too clever by half".

Doing a quick scan of the documentation I found this question in the FAQ that speaks to the complexity of this approach:

Are there any techniques to minimize compile times using spirit. For simple parsers compile time doesn't seem to be a big issue, but I recently tried creating a parser with about 78 rules, and it took about 2 hours to compile

78 rules is a moderately complex grammar, not that unusual. Two hours to build this parser? Yikes.

Note: the response to the question mentions some ways of mitigating the problem but it's not clear how much effect these would have on the compile time.
[gravatar]
This looks really similar to pyparsing: http://pyparsing.sourceforge.net/

It uses Python's operator overloading to construct the grammar. Of course it doesn't take 2 hours to compile...
[gravatar]
Reminds me of parser combinators, which are one of the main techniques for creating parsers in functional languages. E.g. check out parsec .
[gravatar]
I've reviewed code that uses the spirit parser. I wish I could talk from more experience of using Spirit, but my gut feeling is that it doesn't scale well to large grammars. Plus compile times are huge.

I'm up in the air about redefining syntax via a library. The boost lambda library takes the same approach. I'm not convinced it is a good idea.
[gravatar]
An old saying comes to mind: "Engineers can write fortran code in any language." To me, it's a bit disturbing to see C++ operator overloading used to create an entire new language, just as it was disturbing to see C macros that made C more like fortran.

Add a comment:

Ignore this:
Leave this empty:
Name is required. Either email or web are required. Email won't be displayed and I won't spam you. Your web site won't be indexed by search engines.
Don't put anything here:
Leave this empty:
Comment text is Markdown.