Re: How to parse keywords that can be used as identifiers?

anton@a0.complang.tuwien.ac.at (Anton Ertl)
20 Aug 1996 23:08:03 -0400

          From comp.compilers

Related articles
How to parse keywords that can be used as identifiers? mark@research.techforce.nl (Mark Thiehatten) (1996-08-19)
Re: How to parse keywords that can be used as identifiers? anton@a0.complang.tuwien.ac.at (1996-08-20)
Re: How to parse keywords that can be used as identifiers? kanze@lts.sel.alcatel.de (1996-08-20)
Re: How to parse keywords that can be used as identifiers? leichter@smarts.com (Jerry Leichter) (1996-08-21)
Re: How to parse keywords that can be used as identifiers? ph@anweald.exnet.co.uk (1996-08-24)
Re: How to parse keywords that can be used as identifiers? grosch@cocolab.sub.com (1996-08-24)
Re: How to parse keywords that can be used as identifiers? dlmoore@ix.netcom.com (David L Moore) (1996-08-24)
Re: How to parse keywords that can be used as identifiers? itz@rahul.net (1996-08-24)
[6 later articles]
| List of all articles for this month |

From: anton@a0.complang.tuwien.ac.at (Anton Ertl)
Newsgroups: comp.compilers
Date: 20 Aug 1996 23:08:03 -0400
Organization: Institut fuer Computersprachen, Technische Universitaet Wien
References: 96-08-058
Keywords: parse

Mark Thiehatten <mark@research.techforce.nl> writes:
|> I am working on a parser for a language that allows keywords to
|> be used as identifiers. This causes all kinds of problems.
|> I would like to know if somebody has already solved this problem,
|> and, of course, how.


If you allow all keywords as identifiers everywhere, the grammar is
probably ambiguous.


I did something less ambitious 10 years ago. I did it somewhat
intuitively with a hand-written parser, here I'll try to find some
principles that were behind my intuition.


The following method will transform an LL(1) grammar into another
LL(1) grammar that allows keywords instead of identifiers in many
places; and I think you cannot allow more keywords without breaking
the LL(1) property or restructuring the grammar.


First, let me define the first' set:
first' =
    if (eps in first) then
        first+follow-{eps}
    else
        first


Associate every instance of ident in the grammar with a set, initially
consisiting of all keywords. For every rule that contains ident in its
first' set, remove first' from the set associated with the
ident. Finally, for all idents, write rules of the form:


ident1: ident | keyword1 | keyword2 | ...


with the keywords coming from the set associated with the instance of
ident; replace the instance of ident with the newly defined
nonterminal (ident1).


E.g.:
stmt: IF expr THEN stmt ELSE stmt ENDIF
        | ident Becomes expr
        ;


expr: ident
        | NOT expr
        ;


This would be transformed into:


ident1: ident|THEN|ELSE|ENDIF|NOT;
ident2: ident|IF|THEN|ELSE|ENDIF;


stmt: IF expr THEN stmt ELSE stmt ENDIF
        | ident1 Becomes expr
        ;


expr: ident2
        | NOT expr
        ;


IIRC, Bison/yacc can handle LL(1) grammars, so this should work with
them, too (although it may be possible to allow even more keywords
with LALR(1)-grammars).


- anton
--
M. Anton Ertl
anton@mips.complang.tuwien.ac.at
http://www.complang.tuwien.ac.at/anton/home.html
--


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.