|Separation of parsing from AST construction and attribute evaluation firstname.lastname@example.org (2004-10-02)|
|Re: Separation of parsing from AST construction and attribute evaluati email@example.com (Richard Pennington) (2004-10-02)|
|Re: Separation of parsing from AST construction and attribute evaluati cfc@shell01.TheWorld.com (Chris F Clark) (2004-10-04)|
|Re: Separation of parsing from AST construction and attribute evaluati firstname.lastname@example.org (Rodney M. Bates) (2004-10-09)|
|From:||Richard Pennington <email@example.com>|
|Date:||2 Oct 2004 16:25:00 -0400|
|Posted-Date:||02 Oct 2004 16:25:00 EDT|
Ralph Boland wrote:
> I am designing a parser generator tool and I want it to work such that
> the specification for parsing an input language is entirely
> implementation language independant. ...
> CAN ANYONE POINT ME TO PARSER GENERATOR TOOLS THAT SEPARATE
> IMPLEMENTATION LANGUAGE FROM INPUT LANGUAGE SPECIFICATION?
> (I believe Jaccie does this; hopefully there are others.)
> IT WOULD BE GREAT IF ANYONE CAN
> POINT ME TO PAPERS ON THIS SUBJECT.
> Ralph Boland
> [Until yacc came along, parser generators tended to take a grammar,
> create the parse tables, and put them in a file. It was up to you
> to remember what rule had what number and manually match it up with
> the action code. It was a huge convenience when yacc let you put
> the grammar and the code in one file and have it write the code
> to call the action for each rule. -John]
I agree with John's point. I think that trying to separate the syntax
and actions into separate files can become a logistical nightmare.
I opted for a different approach in my tool. I'm concentrating on
simplifying the actions and putting them right into the syntax in
a manner similar to yacc.
I wanted implementation language independence also, so I added a small
interpreted language that can be used to do manipulations of the tree
before later processing.
Here is an example of an (almost) complete definition file:
The file is broken up into three sections, separated by "%%" tokens.
The first section defines the lexical tokens in the input language and
sets up some tool specific parameters.
The second section describes the syntax and defines how trees are
generated during parsing. Here is an excerpt from the syntax part:
add -> $$ = $1 // A numeric expression can be an assignment
| IDENTIFIER '=' expr -> $$ = (Assign $1 $3) // (lowest precedence).
mul -> $$ = $1
| add '+' mul -> $$ = (Add $1 $3) // Add and subtract are defined
here, with precedence
| add '-' mul -> $$ = (Subtract $1 $3)// higher than assignment.
(Sorry about the line wrapping.)
The idea is that the result of parsing
a = b - c + d;
Generates a tree:
(Assign a (Subtract b (Add c d)))
The third section of the definition file tells the tool how to
manipulate the tree. In this example, I use the tool's internal
lisp-like language (plisp) to execute the generated tree. The relevant
definitions for the tree above are:
(defun Add (arg1 arg2) (add arg1 arg2))
(defun Subtract (arg1 arg2) (sub arg1 arg2))
In this case, "add" and "sub" are plisp primitives that perform the
addition and subtraction.
How the trees generated by the input language are dealt with is
controlled by the definition file. The "use plisp" directive in the
first section of the definition file loads a shared library containing
action processing functions that implement the plisp interpreter. The
"add" and "sub" definitions above could be replaced by functions that
generate (intermediate or final) code for addition and subtraction, of
The current implementation reads the appropriate definition file when
presented with a source file (it determines which definition file to use
by the source file's extension). It would be relatively simple to have
the tool produce a parse table and appropriate action handling calls in
some arbitrary implementation language.
I think the result of this approach is that the definition file is
readable because the syntax and actions are closely related, and because
the actions are simple, high level, and easy to understand.
P.S. A couple of clarifying details:
1. The tool keeps track of source file positions for the input language
tokens as well as the definition file stuff so error messages can refer
to the appropriate place.
2. The parser handles ambiguous grammars by keeping multiple parse stacks.
2. The parser returns ambiguous parse trees as a tree:
(ambiguous <first possibility> <second possibilty> ...)
3. Tree actions can be performed at three separate times during
compilation: By default the entire source file is parsed and a syntax
tree is generated before actions are processed. The '!' and '!!'
operators modify this behavior. '!!' actions are performed immediately
during parsing. '!' actions are performed after ambiguous sections of
the tree have been accumulated. The idea is this: A '!!' action might
determine whether an identifier is a normal identifier or a type name by
consulting the symbol table. This type of action can help disambiguate a
grammar and speed up parsing. A '!' action could be used, e.g., to
determine whether a set of generated trees is an expression or a type
definition in the classic C++ disambiguating case.
4. The "error" action is special in that can be used to terminate one or
more parse stacks as well as to report compilation errors.
Return to the
Search the comp.compilers archives again.