If you want to generate a parser, there are lots of choices of software to help you. Most follow the classic yacc model: you write a grammar in a dedicated grammar language, run it through a parser generator, and end up with executable code for your parser. Newer tools run the same way: Bison, ANTLR, and Lemon.

So I was interested to find the Spirit Parser Framework, which employs a completely different technology. Grammars are written in highly stylized C++, and executed directly. The grammar syntax is unconventional, because it's all based on C++ operator overloading. Here's part of a SQL parser:

program =  +(query) ;

query
= longest_d[ select_from_clause
| select_from_where_clause ] >> SEMI;

select_from_clause
=  select_clause >> from_clause ;

select_from_where_clause
=  select_from_clause >> where_clause ;

select_clause
= SELECT >> !(DISTINCT) >> ( STAR | var_list_clause ) ;

var_list_clause
= list_p( varname >> !alias_clause, COMMA ) ;

from_clause
=  FROM >> list_p( table_name >> !alias_clause, COMMA ) ;

Significant template wizardry is at work behind the scenes, I don't claim to understand it.

tagged: , » 5 reactions

Comments

[gravatar]
Bob 9:27 AM on 3 Sep 2004

My gut reaction to a parser generator that uses C++ templates and operator overloading? Clever but scary.

You'll get a mixture of compile-time and runtime errors when your grammar has problems. Getting a C++ compiler error for a parser grammar error isn't going to be pretty. That's not to say that this is an entirely bad idea, just that it may be, as they say "too clever by half".

Doing a quick scan of the documentation I found this question in the FAQ that speaks to the complexity of this approach:

Are there any techniques to minimize compile times using spirit. For simple parsers compile time doesn't seem to be a big issue, but I recently tried creating a parser with about 78 rules, and it took about 2 hours to compile

78 rules is a moderately complex grammar, not that unusual. Two hours to build this parser? Yikes.

Note: the response to the question mentions some ways of mitigating the problem but it's not clear how much effect these would have on the compile time.

[gravatar]
Seo Sanghyeon 9:33 AM on 3 Sep 2004

This looks really similar to pyparsing: http://pyparsing.sourceforge.net/

It uses Python's operator overloading to construct the grammar. Of course it doesn't take 2 hours to compile...

[gravatar]
Kimberley Burchett 9:51 AM on 3 Sep 2004

Reminds me of parser combinators, which are one of the main techniques for creating parsers in functional languages. E.g. check out parsec .

[gravatar]
christopher baus 1:13 PM on 3 Sep 2004

I've reviewed code that uses the spirit parser. I wish I could talk from more experience of using Spirit, but my gut feeling is that it doesn't scale well to large grammars. Plus compile times are huge.

I'm up in the air about redefining syntax via a library. The boost lambda library takes the same approach. I'm not convinced it is a good idea.

[gravatar]
Doug 9:00 AM on 4 Sep 2004

An old saying comes to mind: "Engineers can write fortran code in any language." To me, it's a bit disturbing to see C++ operator overloading used to create an entire new language, just as it was disturbing to see C macros that made C more like fortran.

Add a comment:

name
email
Ignore this:
not displayed and no spam.
Leave this empty:
www
not searched.
 
Name and either email or www are required.
Don't put anything here:
Leave this empty:
URLs auto-link and some tags are allowed: <a><b><i><p><br><pre>.