Re: SQL autocomplete project

Chris F Clark <>
Sat, 10 Oct 2009 20:37:41 -0400

          From comp.compilers

Related articles
SQL autocomplete project (console kid) (2009-10-06)
Re: SQL autocomplete project (Hans-Peter Diettrich) (2009-10-08)
Re: SQL autocomplete project (George Neuner) (2009-10-08)
Re: SQL autocomplete project (John Millaway) (2009-10-08)
Re: SQL autocomplete project (Chris F Clark) (2009-10-10)
| List of all articles for this month |

From: Chris F Clark <>
Newsgroups: comp.compilers
Date: Sat, 10 Oct 2009 20:37:41 -0400
Organization: The World Public Access UNIX, Brookline, MA
References: 09-10-005
Keywords: SQL, parse
Posted-Date: 10 Oct 2009 23:09:36 EDT

console kid <> writes:

> I have a list of SQL grammar

Ok, so 1st determine if the grammar is LL or LR. Translate the
grammar used into the notation of you favorite LL parser generator.
If you don't already have one, you might try ANTLR, COCO/R, or
JavaCC--those are common popular choices. If the parser generator
accepts your grammar without wrnings or errors, it is LL. If not, it
is probably LR, and you should translate the grammar into the notation
of an LR parser generator. Note, one hint is that the grammar is
already in the notation of some parser generator, and that generator
will likely be either LL or LR.

For the sake of the next step, let us assume you have an LL grammar
and you can generate an LL parser. I'm dealing with this case first,
because LL parser generators are currently more popular. Now, there
are two possibilites, your parser generates "tables" (not likely in
the LL case, since it is so easy to generate recursive descent code
from an LL grammar).

So, staying with this case, you probably have a recursive descent
parser generated as code. It will probably look something like this C

int select_stmt() {
    token tk = get_token(); // 1
    if ( tk == SELECT ) {
          tk = get_token(); // 2
          if ( tk != FROM ) return ERROR;
          return select_stmt;
      else if ( tk == FROM ) {
      else return ERROR;

Each place get_token() is called, you can print out what is expected
for that token and what remains to be parsed. For example, at the
get_token() call I have labelled 1. You are expecting either a SELECT
token or a FROM token. The clause you are expecting is either:

{.SELECT selectPart FROM fromPart |
  .FROM fromPart SELECT )

The dot shows where you are in the parse and is before the token (or
non-terminal) expected next.

If you are at the 2nd get_token() call, you would print this:

SELECT selectPart .FROM fromPart

There are some details about walking back up the recursion stack in
recursive descent that make this a little more cumbersome, but they
can easily be resolved. I leave them as an exercise to the reader, or
you can ask again, if you go down that path. (But I'll give you a
hint, traversing up the stack finds the same "parent rules" that
Yacc++ generates for LL grammars.)

Note, the process is actually easier if you have a table driven
parser, because in that case, the stack is maintained outside the call
stack and you can simply traverse it. However, there are few table
driven LL parser generators, because recursive descent is so easy to
generate and it pushes many of the inmplementation issues down onto
the host language.

So, now let's investigate the other case, where one has an LR grammar
and is using an LR table generator. Common choices for that are Yacc,
Bison, and Yacc++, which I am biased toward as one of the authors. In
particular, you will find most (if not all) the support for doing what
you want built into Yacc++.

Note, your description here, sounds a lot like you are attempting to
use an LR tool.

> so according to the rules in that
> grammar file the program will read that file
> and build internal NFA graph , which every NFA state contains a string
> of suggestion syntax.

An LR parser generator builds up a machine that looks line an NFA--in
fact, it is an NFA for "viable prefixes" at least if interpreted
correctly. Some theorists will object to that characterization,
because part of the correct interpretation relies on building a stack,
which means its not a true FA in the classical sense.

The automaton that an LR parser generator builds consists of states,
and those states are generally described by "dotted items" (exactly
what we were outputing before). So, at any state, the LR parser
generato has calculated and can tell you exactly what dotted items to
output. One simply writes out those items for the current state and
then one finds the state in the "parent rule" (the rule which called
the current rule).

Finding the parent rule is the only tricky part. In Yacc++, it isn't
so tricky because we do only one push per rule, so you simply pop the
stack once and look at the state on the top of the stack--that is your
parent rule state. You repeat that until the stack is empty. (Note
Yacc++ isn't the only LR parser generator to work this way--there was
a paper someone published showing how to apply this as an optimization
to LR machines.)

If you are using a traditional LR parser generator, the popping is a
little more tricky. The number of items one pops are related to the
length of the prefix of the rule recognized. Thus, if we are looking
at the third token in a rule, we pop two entries from the stack to get
back to the state in the parent rule. Again, one repeats until the
stack is empty.

There are a couple of subtle points I've glossed over. That will give
you something to work on, but the base algorithm is above for either
language family.

Note, if you want to use Yacc++ for your project, just send me an
email. We now give away free copies for academic/open source use. We
only require that if you develop something that you make the result
available as open source. You just need to write and ask and then
wait for me to respond, which isn't always fast as it isn't my primary
focus these days.

Hope this helps,

Chris Clark email:
Compiler Resources, Inc. Web Site:
23 Bailey Rd voice: (508) 435-5016
Berlin, MA 01503 USA twitter: @intel_chris

Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.