Re: How to parse C-programs without communication between parser and lexer

Scott Stanchfield <scooter@mccabe.com>
19 Aug 1996 23:13:33 -0400

          From comp.compilers

Related articles
How to parse C-programs without communication between parser and lexer sodtke@infbssts.ips.cs.tu-bs.de (Detlef Sodtke) (1996-08-15)
Re: How to parse C-programs without communication between parser and l henry@zoo.toronto.edu (Henry Spencer) (1996-08-16)
Re: How to parse C-programs without communication between parser and l scooter@mccabe.com (Scott Stanchfield) (1996-08-19)
How to parse C-programs without communication between parser and lexer ig@estar.msk.su (Ilfak Guilfanov) (1996-08-27)
| List of all articles for this month |
From: Scott Stanchfield <scooter@mccabe.com>
Newsgroups: comp.compilers
Date: 19 Aug 1996 23:13:33 -0400
Organization: McCabe & Associates
References: 96-08-040 96-08-051
Keywords: C, parse

This is one case where semantic predicates in PCCTS can be very handy.
(I haven't written a C compiler before, so please give me a little
lattitude...)


What you can do, is wherever in the grammar you need to know "is it a
typedef or other ident" you could add a


    <<isTypedef(LA(1))>>?


at the start of the alternative. For example, taking the example of
a variable decl in C++ that looks like a function call, we could have
(note -- this is probably not a rule you would really use in a C++
compiler, but it'll make a decent example.)


var_decl_or_function_call
    : <<isTypedef(LA(1))>>? typeID:IDENT LPAREN varID:IDENT RPAREN
            <<
                // action code to declare varID as a variable of type typeID
            >>
    | IDENT LPAREN expression_list RPAREN
            <<
                // action code to generate a function call
            >>
    ;


Again, this is not meant to be a real rule, but what it shows is how
an input like
        T(x)
can be determined to select the "var-decl" alternative or the
"function-call" alternative.


(Note that we'd probably have to qualify these alts with
syntactic-predicates/context-guards to keep them from conflicting with
other rules like assignment, but that would cloud this issue.)


(In PCCTS, LA(1) represents the first token in the token buffer, LA(2)
is the token after that, and so on.)


The <<isTypedef(LA(1))>>? thing is a semantic predicate. It's purpose
is to add an extra condition to the selection test for the alternative.
Normally, the selection test might look like
        if (LA(1) == IDENT)
            {do alt 1 stuff}


but the predicate changes it to
        if (LA(1) == IDENT && isTypedef(LA(1)))
            {do alt 1 stuff}


The stuff in between the << >>? can be any valid C expression you want,
but it must return an integer (or bool) value. In this case, the
writer of the grammar had defined a function isTypedef(x) that will
look up the token in the symbol table and return whether or not that
token represents a typedef name. This of course assumes that you have a
rule for typedef that sets it up in the symbol table...


The main idea behind semantic predicates is to allow you to move tests
that convey semantic info from the scanner to the parser. This is
especially important in parsers that utilize multiple lookahead, as the
scanner's context and parser's context (with regard to position in the
source file) can be significantly different. Instead of talking back to
the scanner, the scanner _just_ groups tokens and sticks them in a
buffer as needed. The parser can use the tokens whenever it sees fit,
and its predicates can help it distinguish what the input really means.


Hope this helps a bit. I would strongly recommend that you take a look
at PCCTS. (There are some other predicated parser generators out there,
but I don't know anything about them -- I'm really happy with PCCTS.)


      http://www.igs.net/~mtr/software-development/pccts.htm


is a good starting place for PCCTS info.


Good luck!
Scott


--
Scott Stanchfield McCabe & Associates -- Columbia, Maryland
--


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.