Handing typedefs in yacc generated parsers

Dibyendu Majumdar <dibyendu@mazumdar.demon.co.uk>
11 Jan 1999 14:38:26 -0500

          From comp.compilers

Related articles
Handing typedefs in yacc generated parsers dibyendu@mazumdar.demon.co.uk (Dibyendu Majumdar) (1999-01-11)
Re: Handing typedefs in yacc generated parsers desw@cogs.susx.ac.uk (1999-01-15)
Re: Handing typedefs in yacc generated parsers dibyendu@mazumdar.demon.co.uk (Dibyendu Majumdar) (1999-01-19)
| List of all articles for this month |

From: Dibyendu Majumdar <dibyendu@mazumdar.demon.co.uk>
Newsgroups: comp.compilers
Date: 11 Jan 1999 14:38:26 -0500
Organization: Compilers Central
Keywords: C, types


I would appreciate your advice on following:

I am working on the UPS C Interpreter - fixing bugs and improving
compliance to the C Standard. One area where the interpreter was weak
was in the handling of typedefs. I have made changes which seem to
work - but am not sure if my way of handling it was correct.

The original implementation used a simple lookup function to
distinguish between IDENTIFIER and TYPEDEF_NAME. The parser provided a
function for this purpose - the lexical analyzer called the function
when it encountered an IDENTIFIER. If the the lookup function found
that the name was a typedef name, it returned TYPEDEF_NAME - and
that's what the lexer returned as the token.

The problem with this approach was that a TYPEDEF_NAME could appear
anywhere an IDENTIFIER was expected, causing the yacc parser (grammer
based on K&R2) to fail.

I have solved this problem by adding context sensitivity to the
lexer. I did this as follows:

1) I added a couple of flags to the lexer.

      bool in_decl_specifier;
      bool seen_type_specifier;

      When the lexer encounters the keywords STRUCT, UNION, or
      ENUM, it sets both flags to TRUE.

      When the lexer encounters either VOID, CHAR, SHORT, INT,
      LONG, FLOAT, DOUBLE, SIGNED or UNSIGNED, it sets both flags
      to TRUE.

      When the lexer encounters either STATIC, AUTO, REGISTER, EXTERN,
      TYPEDEF, CONST or VOLATILE, and the flag in_decl_specifier
      is FALSE, it sets seen_type_specifier to FALSE (just to be sure)
      and in_decl_specifier to TRUE. Otherwise it does nothing.

      When an IDENTIFIER is found, the lexer first calls the
      parser function described before. If the parser function
      identifies a TYPEDEF_NAME, then the lexer does one of
      the following:

      * If the previous token was GOTO, DOT (.) or
          ARROW (->), it returns IDENTIFIER instead of

      * Else, if in_decl_specifier flag is TRUE - the action
          taken is one of following. If seen_type_specifier is also TRUE,
          it returns IDENTIFIER, otherwise it sets seen_type_specifier
          to TRUE and returns TYPEDEF_NAME.

      * Else, if next token is COLON (:) and previous token was
          either RBRACE (}) or SEMI (;), it calls a parser function
          called ci_label_allowed() (described later)
          to determine if Labels are allowed. If not, it returns
          TYPEDEF_NAME, otherwise, IDENTIFIER is returned.

      * If none of above match, the flags in_decl_specifier and
          seen_type_specifier are set to TRUE, and TYPEDEF_NAME is

      The lexer resets the flags in_decl_specifier and
      seen_type_specifier when it encounters any token not
      allowed in a declaration specifier (including IDENTIFIER).

2. I added two flags to the parser as well. These flags are
      set when a) parsing enum constants, and b) struct/union members.

      The parser typedef lookup function tests the first flag. If the
      flag is set it does not lookup the name at all, and returns
      IDENTIFIER straightaway. (If the name was already defined
      as a TYPEDEF_NAME, the redefinition is reported by the
      parser later during semantic analysis).

      The second flag is used by the lexer to determine if Labels
      are allowed (see previous section) when it sees a construct
      that looks like either a label or a bitfield.

With above changes, the interpreter is able to parse typedef names
correctly. In my tests so far, the namespace/scoping rules of Standard
C are followed correctly.

My question is this:

Is this the right way to deal with this problem in a yacc generated
parser ? How have other people dealt with similar problems (without
rewriting the grammer as suggested by Jim Roskind) ? My intention is
to avoid changing the grammer - because that would mean much more
changes to the parser, which otherwise works fine.

Any help would be much appreciated.

Thanks and Regards

The website for the UPS C Interpreter is www.concerto.demon.co.uk.
The UPS Debugger/Interpreter was created by Mark Russell. For
more information please check the website.

Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.