|size of parse table? email@example.com (2000-01-21)|
|Re: size of parse table? firstname.lastname@example.org (Charles E. Bortle, Jr.) (2000-02-05)|
|Re: size of parse table? email@example.com (Chris F Clark) (2000-02-05)|
|Re: size of parse table? firstname.lastname@example.org (2000-02-15)|
|Re: size of parse table? email@example.com (Charles E. Bortle, Jr.) (2000-02-16)|
|From:||"Charles E. Bortle, Jr." <firstname.lastname@example.org>|
|Date:||5 Feb 2000 18:44:55 -0500|
At least for LL(1) grammers you are pretty close to correct in your
(This is somewhat long, so please bear with me :-)
A naive view of the parse table is a true 2 dimensional array with as
many rows as there are non-terminals and as many columns are there are
terminals. For a "small" language this might be reasonable, but for
most "real" languages this becomes unwieldy.
For instance, I have written a LL(1) parse table generator. I have
also written a grammar definition for the syntax of Turbo Pascal 3.0a.
My definition contains 186 terminal symbols (tokens), 264 non-terminal
symbols (as well as, at the current development state, 147 semantic
action symbols). Now, if the table were built naively, it would
require 49,104 entries, each containing the number of the production
(rule) being predicted. (Generally this number will be a 16bit
integer, so the table would require a minimum of 98,208 bytes). This
size itself is prohibitive, but considering that the majority of these
entries do not contain valid predicted productions, but rather
represent syntax errors, representing this as a true 2 dimensional
array is not realistic.
There are ways of representing this table as a sparse array, by means
of sets of arrays representing columns, and by other means. (set the
The way I have chosen is to represent this in a single dimensional
array containing records having three fields: 1) The non-terminal
(row), 2) The terminal (column), 3) The number of the predicted
Now, look-up is done by first finding the first entry that has the
current non-terminal (row), now, beginning at that entry, a comparison
is made to see if that entry contains the current terminal (column).
If the terminal is found, the predicted (next) production is
represented by its number in the production number field. If the
current terminal is not in the entry the next entry is checked, after
first veriying that that entry still contains the current
non-terminal. Thus, we trade linier search time for a huge space
savings. For the above Turbo Pascal grammer, my linier search table
has only 3133 entries, where each entry contains 3 (16bit) integers.
Thus my compact table reqires only 18,798 bytes instead of the 98,208
bytes required of the 2 dimensional array true table!
My parse table generator generates the tables (a number of tables
besides the parse table itself) and associated lists of constants for
terminal, non-terminal, and action symbols, in constant definitions to
be plugged in, via the Include directive, into Turbo Pascal 7.0 code
that makes up the body of the compiler I have written.
Besides the parse table itself, you need a table of the right sides of
all the productions. (I can provide more detail of my implementation
if you want it.)
Note also, you can determin the error processing (message printing,
etc) when no entry is found for the non-terminal, terminal pair, by
considering the current non-terminal, the current terminal, and other
state info, therefore you do not need a table entry for the error
condition; however, you could include grammer productions (rules) to
explicitly indicate error conditions, but this gets messy. Also, for
LL(1) processing you can, depending upon the actuall parser (driver)
routines, provide for some automatic or semi-automatic error recovery
and correction (I have no experience with this though).
My parser generator also produces a table with the actuall lexical
representation of each symbol, so you can, if you choose, incorporate
this table into the compiler, allowing the compiler to print exact
lexical representations of the symbols in error messages or in
diagnostics (such as stack cartoons). In fact, this is how I debugged
the compiler, and the design of my grammar. I turn the stack cartoons
on and off as needed to see the look-ahead symbol, the syntax stack,
and the semantic stack as the parse proceeds. The terminal and
non-terminal symbols are displayed in there lexical form so I don't
have to look up each symbol's numeric representation.
I hope this is helpful, I can provde more info via email (or in the ng
if John wants it)
BTW, besides the Pascal compiler project, I have used my parse table
generator to generate a parse table for a lexical analizer table
generator (scanner table generator). The scanner for this has been
hand coded, but once I get the scanner table generator working, I can
bootstrap by then using it to generate the scanner for the next
generation of itself.
* http://pw2.netcom.com/~cbrtjr/wrdthing.html *
> I've read that parsers generate a table of productions, with each
> state leading to several possible next states depending on the next
> token. I seem to remember that this table is a bunch of labelled
> states, and states are labelled such that you can add the ID for the
> next token to some base determined by your current state to transition
> to your next state.
> Assuming I got that right, how big is the table compared to the total
> number of (state, transition) pairs? Is the number of
> (state,transition) pairs the same as the number of states?
Return to the
Search the comp.compilers archives again.