Rod Waldhoff's Weblog  

< Monday, 7 July 2003 >
Wanted: Modular/Extensible Parser Generator #

Over at the Axion project, we're using a JavaCC grammar to implement the SQL parser.

One of things we've found Axion to be good for is unit testing database applications. In other words, one can use an in-memory, in-process Axion database as a "mock" replacement for a regular production database. In this case, it's quite useful to have Axion closely mimic the syntax of other RDBM systems. For example, Axion supports LIMIT and OFFSET clauses like PostgeSQL and MySQL as well as a ROWNUM pseudo-column like Oracle. Similarly, Axion supports the ISO SQL 99 syntax for outer joins (FROM a LEFT OUTER JOIN b ON a.id = b.id) , but it would be nice to support Oracle's custom syntax (FROM a, b WHERE a.id = b.id(+)) as well.

Supporting the idiosyncrasies of several of the popular database engines in a single grammar file seems cumbersome at best. and probably impossible. It tends to bloat our keyword namespace. Eventually, it must lead to conflicts. (For example, if I'm trying to unit test code that eventually interacts with an Oracle database, then the "mock" database shouldn't accept LIMIT and OFFSET clauses, and shouldn't consider those to be keywords either.)

Axion's design is modular enough to allow for pluggable parser implementations. Indeed, anything that implements:

interface Parser {
  AxionCommand parse(String sql) throws AxionException;
}

can be dropped right in. Hence it is straightforward to define, for example, MySqlSyntax.jj, OracleSyntax.jj, SqlServerSyntax.jj, etc. to support a specific SQL dialect. The trouble is, each of those files is going to be 90% or more the same. What I'd like is a clean mechanism for either:

  • declaring SpecializedGrammar to extend from GeneralizedGrammer, perhaps with abstract productions and the like, or
  • declaring SpecializedGrammar to be the composition of SubGrammarA and SubGrammarB

or both. I'm especially interested in being able to combine grammars at runtime. Anyone have any suggestions? Can you point to an example?

(I'll be honest, I'm not much of a *CC expert. This might be straightforward in JJTree or ANTLR, I just haven't dug into it much. I'm pretty sure it's not straightforward in plain old JavaCC.)