Is this a some kind of regular grammar?

"Chris F Clark" <cfc@world.std.com>
2 Jul 2002 01:14:14 -0400

          From comp.compilers

Related articles
Is this a some kind of regular grammar? daniel_shane_eicon@hotmail.com (Daniel Shane) (2002-06-28)
Is this a some kind of regular grammar? cfc@world.std.com (Chris F Clark) (2002-07-02)
Re: Is this a some kind of regular grammar? michaeldyck@shaw.ca (Michael Dyck) (2002-07-02)
Re: Is this a some kind of regular grammar? robert.thorpe@antenova.com (Robert Thorpe) (2002-07-04)
| List of all articles for this month |

From: "Chris F Clark" <cfc@world.std.com>
Newsgroups: comp.compilers
Date: 2 Jul 2002 01:14:14 -0400
Organization: Compilers Central
References: 02-06-069
Keywords: parse
Posted-Date: 02 Jul 2002 01:14:13 EDT

> One of my problems is that sometimes it is impossible for me to
> distinguish a token at the lexer stage, because multiple regular
> expression can match the same token depending on the context. What I
> need is an integrated solution which combines the lexer and the parser
> at the same time.


There are integrated solutions, sometimes called "scannerless"
parsers.


There are also other ways of dealing with the overlapping regular
expressions. Depending on why your grammar has overlaps can help
decide if one of these solutions is appropriate.


For example, it is common for (some) integer tokens to be a prefix of
real tokens. Generally, one applies the "longest match rule" in that
case, if the prefix can be legally extended into a real token, then
the real token is the desired lexing.


Another source of overlapping regular expressions is identifiers that
are also keywords. In some contexts, it is desirable to treat these
identifiers as keywords and in other contexts, simply as identifiers.
This slight-of-hand can be done by grammar modification where the
lexer always returns the keyword token, but the parser remaps the
keyword into an identifier where appropriate.


> Of course the resulting grammar would not be LALR(1), but is there a
> way to build an algorithm that can parse this type of grammar
> without going to a full blown N^3? Surely this grammar must fit
> somewhere between N^3 and N (for LALR(1).


If you used a scannerless parser based upon Tomita (aka GLR) parsing,
you would find that it does approximately what you are asking.


> Unfortunately, with the above grammar construction, we would loose
> the notion of parse tree in the process (i.e. it would be impossible
> to say, ok the document matched now can you show me the detailed
> parsing?).


There are also "predicated-parsers" which allow one to effectively
parse each thing twice, once with some general rule and then with a
rule that captures the structure. That might also solve your problem.


Hope this helps,
-Chris


*****************************************************************************
Chris Clark Internet : compres@world.std.com
Compiler Resources, Inc. Web Site : http://world.std.com/~compres
3 Proctor Street voice : (508) 435-5016
Hopkinton, MA 01748 USA fax : (508) 435-4847 (24 hours)
------------------------------------------------------------------------------


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.