|Writing Grammars email@example.com (Tim Newsham) (2002-07-25)|
|Re: Writing Grammars firstname.lastname@example.org (Ira Baxter) (2002-07-31)|
|Re: Writing Grammars email@example.com (Mark) (2002-07-31)|
|Re: Writing Grammars firstname.lastname@example.org (Peter H. Froehlich) (2002-07-31)|
|Re: Writing Grammars email@example.com (Tim Newsham) (2002-08-04)|
|Re: Writing Grammars firstname.lastname@example.org (SLK Parsers) (2002-08-04)|
|Re: Writing Grammars email@example.com (VBDis) (2002-08-10)|
|From:||"Ira Baxter" <firstname.lastname@example.org>|
|Date:||31 Jul 2002 00:53:21 -0400|
|Posted-Date:||31 Jul 2002 00:53:21 EDT|
"Tim Newsham" <email@example.com> wrote in message
> It seems pretty typical in compiler (and other parsing) implementation
> that you would:
> - start with a grammar of the language
> - modify the grammar to be easy to parse (ie. LALR(1))
In fact, you modify the grammar to make it possible to parse
with the parsing technology you have at hand.
> - add rules to build an abstract syntax tree from the
> resulting parse (either the parse tree, or as actions
> during the parse process).
> Now if you use a general parsing algorithm like tomita parsing or
> earley (or modified earley) parsing, you can go directly from the the
> unmodified grammar.
(We use Tomita parsers for a whole bunch of langauges
for our DMS Software Reengineering Toolkit)
You *can* do this, but in fact we rarely do.
First, many "language grammars" include syntax-constructurs that
aren't supported directly by your parsing engine. (There's also
the really annoying problem of not actually having a documented
grammar anywhere :-()
We consciously dropped Kleene star/plus and alternation from our grammars,
because they didn't really buy us that much, and when you try to
code attribute grammars or other grammar-derived tools,
you have to be extremely careful to manage these additional operators
well if you can do it all.
What we actually do is to flatten out our grammars to remove
the syntax constructors. We also often remove what amounts
to optional-rules. Since we build essentially a tree-node per rule
(our tree builder will remove unary production nodes), removing
optional-rules tends to produce fewer tree nodes overall,
and we are interested in saving space, since we often read
several million lines into a single session.
> But if you want an efficient parser, you take the time and modify the
.... as above...
> and then build actions that basically undo some of the
> ugliness of the parse (because the AST looks similar to the parse of
> the original grammar).
Pretty much we don't undo any of this. Better to operate
on a documented grammar.
> My question is: Are there any tools or research into tools to help
> automate the process of going from step 1 to step 2. It seems that
> this is a very difficult and error-prone step. Basically, software
> that would perform the same types of transformations that a human
> would perform on the grammar.
Well, whatever regular action you take on the grammar can be cast
as formal transformations on the grammar. DMS does have
a grammar for one variety of EBNF (and it is easy to define other
grammars). DMS also supports writing source-to-source transforms
over any of its defined input languages, and we use that capability
for lots of reengineering tasks. We haven't bothered to code these
rules for EBNF, mostly because nobody ever gives us a grammer
in the specific EBNF we chose, and the handwork just isn't that hard.
Typically it takes about 1/2 day to do this manually on a typical 500 rule
grammar, so it isn't that big a win compared to all the other work we
have to do.
> Even better, to also perform the same
> type of transformation on the AST building code as it goes along, so
> that the parse tree for the original grammar is built automatically in
> the actions of the modified grammar.
> Tim N.
We don't try to do this to the grammer or the parsing engine in an offline
The online parsing engine does supress unary productions
and form list-nodes from pairs of grammar rules that look like list-formers.
But we go through a great deal of trouble to hide these facts
from people writing rules, because it is only a representational detail.
The grammar should be treated as the Truth.
Ira Baxter, Ph.D. CTO Semantic Designs
Return to the
Search the comp.compilers archives again.