Re: bison parser : retrieving values from recursive pattern

Kaz Kylheku <>
Fri, 07 Jul 2023 01:14:04 -0000

          From comp.compilers

Related articles
bison parser : retrieving values from recursive pattern (Archana Deshmukh) (2023-07-06)
Re: bison parser : retrieving values from recursive pattern (Kaz Kylheku) (2023-07-07)
| List of all articles for this month |

From: Kaz Kylheku <>
Newsgroups: comp.compilers
Date: Fri, 07 Jul 2023 01:14:04 -0000
Organization: Compilers Central
References: 23-07-001
Injection-Info:; posting-host=""; logging-data="50127"; mail-complaints-to=""
Keywords: yacc, design
Posted-Date: 06 Jul 2023 21:17:02 EDT

On 2023-07-06, Archana Deshmukh <> wrote:
> Hello,
> I have a following rule
> num :
>| integer comma num
>| integer closeroundbkt
>| integer closesquarebkt

Recognizing close brackets in a different rule from the open ones is
not absolutely off the table, but it's a code smell.

Consider a nice grammar like

    list : '(' items ')'
              | '(' ')'
              | '[' items ']'
              | '[' ']'

    items : items ',' item
                | item

    item : list | num | type | decl

    decl : keyword ':' oper list

    keyword : KW_main | KW_data | KW_param_1

    type : TYPE_float32 | ...

    oper : OPER_r | OPER_or

I'd make all the symbols just one token type SYMBOL, and deal with it
all semantically later in the pipeline.

I.e. the over-generated grammar would allow nonsense like

    @data(%float32: foo[(1, 2, 3, 4), param_1], main: ...)

This would be checked for validity semantically; that the right
kinds of symbols are in the right positions in the shape.

    list : '(' items ')'
              | '(' ')'
              | '[' items ']'
              | '[' ']'

    items : items ',' item
                | item

    item : list | num | SYMBOL | decl

    decl : SYMBOL ':' SYMBOL list

Lisp teaches us that reserved keywords are largely inflexible
and counterproductive.

Make your SYMBOl objects interned, and give them a type like
"struct symbol *". Interned means that when the same symbol
occurs more than once, the parser returns the same pointer:

      SYMBOL { $$ = intern($1); } /* $1 is the yytext lexeme */

The first time intern("foo") is called it creates and return
s a symbol sym such that sym->name is foo (a strdup-ed copy)
The second time intern("foo") is called, it returns exactly
the same object!

In your program you can have initialization like this:

    struct symbol *float32_s;

    void global_init(void)
          float32_s = intern("float32");


Then when the parser sees float32, it will produce
the same pointer.

The upshot is that you never have to compare strings.
If you want to check, is x the float32 symbol, you just use
the == operator;

    void foo(struct symbol *x)
        if (X == float32_s) {
            // we are looking at the float32 symbol



Because symbols are just pointers, they are also fast to hash.
A hash table which maps symbols to other things just has
to hash the 4 or 8 byte pointer, not the string. This can
be done in a few bit operations.

Important global properties about symbols can be stored
in the struct symbol itself. For instance float32 is
a type, so there can be a sym->is_type property,
which is true for float32. Then you can easily check
whether some list has a type symbol in a certain position.
First check there is a symbol and if so, that it is
one with the is_type property true.

Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.