Newbie needs YACC Help!

Keith Little <gklittle@sprintmail.com>
16 Feb 1997 22:30:55 -0500

          From comp.compilers

Related articles
Newbie needs YACC Help! gklittle@sprintmail.com (Keith Little) (1997-02-16)
| List of all articles for this month |

From: Keith Little <gklittle@sprintmail.com>
Newsgroups: comp.compilers
Date: 16 Feb 1997 22:30:55 -0500
Organization: Spectra Software
Keywords: parse

I was wondering if anyone could help me. I'm writing an analyzer for
a language very similar to C. It simply has to relate variable
declarations with their usage (but differently than a real compiler).
For example, in an assignment statement, all Rvalues relate to the
Lvalue.


For example:


1 INT c; (global)
2
3 INT x(); (Integer Procedure declaration)
4 INT a; (variable passed by value)
5 BEGIN
6 INT b,
9 b := b + 1 / a; (1 & a relate to b)
10 c := b; (b relates to c)
11 RETURN 1;
12 END;


The records my program would generate are:


Line A field B field Verb Operator Dec or Use
---- ------- ------- ---- -------- ----------
1 c D
3 x D
4 a D
6 b D
9 b 1 := + U
9 b a := / U
10 c b := U


My present yacc code for parsing assignment statements and expressions
is as follows:


assign_stmt:
assign_stmt_lev1 assign_stmt_lev2 ';'
{
eval_stmt();
}
| error ';'
;


assign_stmt_lev1:
IDENT ':='
{
push_token($1);
push_token($2);
}
;


assign_stmt_lev2:
expr
| assign_stmt_lev2 expr
;


expr: (This is different than the
arith_op '(' expr ')' usual recursive definition)
| arith_op IDENT
{
push_token($1);
push_token($2);
}
| arith_op CONST
{
push_token($1);
push_token($2);
}
| (etc...)
;


arith_op:
/* void */
| '+'
{
strcpy($$, $1);
}
| (etc...)
;


As I receive the tokens, I push the verb, operands and operators onto
a stack (with the scope as a separate stack) like this:


Statement Resulting Stack
--------- ---------------


a := b + c; c
+
b
:=
a


And when I get to the semicolon, I evaluate the stack from the bottom
(the latest verb) up.


- Is this a good method, or is there a better one?
- What's a good method for recovering from parse errors?
- Where should I put error tokens in the example above?
- When should I clear the token stack?
- What's a good method for detecting a faulty scope?
- How should I resynchronize the scope stack.
- What if they forget the semicolon?
- Can the error token be a non-terminal symbol? That would allow
    resynchronization on a class of tokens. For example:


assign_stmt:
IDENT ':=' expr ';'
| error error_class
;


error_class:
ASSIGN_OP
| CASE
| DO
| END
| IF
| (etc...)
| ';'
;


Our COBOL analyzer simply looks for the next verb, but this language
has scope of variables, so I'll have to unravel my stacks in a careful
manner.


Any suggestions and/or examples would be greatly appreciated.


Thanks,


Keith Little
[I suggested in private mail that simplified parse trees would be a lot
handier data structure than trying to force everything into stacks. -John]


--


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.