Related articles |
---|
Newbie needs YACC Help! gklittle@sprintmail.com (Keith Little) (1997-02-16) |
From: | Keith Little <gklittle@sprintmail.com> |
Newsgroups: | comp.compilers |
Date: | 16 Feb 1997 22:30:55 -0500 |
Organization: | Spectra Software |
Keywords: | parse |
I was wondering if anyone could help me. I'm writing an analyzer for
a language very similar to C. It simply has to relate variable
declarations with their usage (but differently than a real compiler).
For example, in an assignment statement, all Rvalues relate to the
Lvalue.
For example:
1 INT c; (global)
2
3 INT x(); (Integer Procedure declaration)
4 INT a; (variable passed by value)
5 BEGIN
6 INT b,
9 b := b + 1 / a; (1 & a relate to b)
10 c := b; (b relates to c)
11 RETURN 1;
12 END;
The records my program would generate are:
Line A field B field Verb Operator Dec or Use
---- ------- ------- ---- -------- ----------
1 c D
3 x D
4 a D
6 b D
9 b 1 := + U
9 b a := / U
10 c b := U
My present yacc code for parsing assignment statements and expressions
is as follows:
assign_stmt:
assign_stmt_lev1 assign_stmt_lev2 ';'
{
eval_stmt();
}
| error ';'
;
assign_stmt_lev1:
IDENT ':='
{
push_token($1);
push_token($2);
}
;
assign_stmt_lev2:
expr
| assign_stmt_lev2 expr
;
expr: (This is different than the
arith_op '(' expr ')' usual recursive definition)
| arith_op IDENT
{
push_token($1);
push_token($2);
}
| arith_op CONST
{
push_token($1);
push_token($2);
}
| (etc...)
;
arith_op:
/* void */
| '+'
{
strcpy($$, $1);
}
| (etc...)
;
As I receive the tokens, I push the verb, operands and operators onto
a stack (with the scope as a separate stack) like this:
Statement Resulting Stack
--------- ---------------
a := b + c; c
+
b
:=
a
And when I get to the semicolon, I evaluate the stack from the bottom
(the latest verb) up.
- Is this a good method, or is there a better one?
- What's a good method for recovering from parse errors?
- Where should I put error tokens in the example above?
- When should I clear the token stack?
- What's a good method for detecting a faulty scope?
- How should I resynchronize the scope stack.
- What if they forget the semicolon?
- Can the error token be a non-terminal symbol? That would allow
resynchronization on a class of tokens. For example:
assign_stmt:
IDENT ':=' expr ';'
| error error_class
;
error_class:
ASSIGN_OP
| CASE
| DO
| END
| IF
| (etc...)
| ';'
;
Our COBOL analyzer simply looks for the next verb, but this language
has scope of variables, so I'll have to unravel my stacks in a careful
manner.
Any suggestions and/or examples would be greatly appreciated.
Thanks,
Keith Little
[I suggested in private mail that simplified parse trees would be a lot
handier data structure than trying to force everything into stacks. -John]
--
Return to the
comp.compilers page.
Search the
comp.compilers archives again.