Related articles |
---|
bison parser : retrieving values from recursive pattern desharchana19@gmail.com (Archana Deshmukh) (2023-07-06) |
Re: bison parser : retrieving values from recursive pattern 864-117-4973@kylheku.com (Kaz Kylheku) (2023-07-07) |
From: | Kaz Kylheku <864-117-4973@kylheku.com> |
Newsgroups: | comp.compilers |
Date: | Fri, 07 Jul 2023 01:14:04 -0000 |
Organization: | Compilers Central |
References: | 23-07-001 |
Injection-Info: | gal.iecc.com; posting-host="news.iecc.com:2001:470:1f07:1126:0:676f:7373:6970"; logging-data="50127"; mail-complaints-to="abuse@iecc.com" |
Keywords: | yacc, design |
Posted-Date: | 06 Jul 2023 21:17:02 EDT |
On 2023-07-06, Archana Deshmukh <desharchana19@gmail.com> wrote:
> Hello,
>
> I have a following rule
>
> num :
>| integer comma num
>| integer closeroundbkt
>| integer closesquarebkt
>
Recognizing close brackets in a different rule from the open ones is
not absolutely off the table, but it's a code smell.
Consider a nice grammar like
list : '(' items ')'
| '(' ')'
| '[' items ']'
| '[' ']'
items : items ',' item
| item
;
item : list | num | type | decl
decl : keyword ':' oper list
keyword : KW_main | KW_data | KW_param_1
type : TYPE_float32 | ...
oper : OPER_r | OPER_or
I'd make all the symbols just one token type SYMBOL, and deal with it
all semantically later in the pipeline.
I.e. the over-generated grammar would allow nonsense like
@data(%float32: foo[(1, 2, 3, 4), param_1], main: ...)
This would be checked for validity semantically; that the right
kinds of symbols are in the right positions in the shape.
list : '(' items ')'
| '(' ')'
| '[' items ']'
| '[' ']'
items : items ',' item
| item
;
item : list | num | SYMBOL | decl
decl : SYMBOL ':' SYMBOL list
Lisp teaches us that reserved keywords are largely inflexible
and counterproductive.
Make your SYMBOl objects interned, and give them a type like
"struct symbol *". Interned means that when the same symbol
occurs more than once, the parser returns the same pointer:
SYMBOL { $$ = intern($1); } /* $1 is the yytext lexeme */
The first time intern("foo") is called it creates and return
s a symbol sym such that sym->name is foo (a strdup-ed copy)
The second time intern("foo") is called, it returns exactly
the same object!
In your program you can have initialization like this:
struct symbol *float32_s;
void global_init(void)
{
float32_s = intern("float32");
...
}
Then when the parser sees float32, it will produce
the same pointer.
The upshot is that you never have to compare strings.
If you want to check, is x the float32 symbol, you just use
the == operator;
void foo(struct symbol *x)
{
if (X == float32_s) {
// we are looking at the float32 symbol
}
}
Because symbols are just pointers, they are also fast to hash.
A hash table which maps symbols to other things just has
to hash the 4 or 8 byte pointer, not the string. This can
be done in a few bit operations.
Important global properties about symbols can be stored
in the struct symbol itself. For instance float32 is
a type, so there can be a sym->is_type property,
which is true for float32. Then you can easily check
whether some list has a type symbol in a certain position.
First check there is a symbol and if so, that it is
one with the is_type property true.
Return to the
comp.compilers page.
Search the
comp.compilers archives again.