Re: regular expression operators in CF grammars

kgw-news@stiscan.com
21 Jul 2002 02:11:35 -0400

          From comp.compilers

Related articles
[12 earlier articles]
Re: regular expression operators in CF grammars cfc@world.std.com (Chris F Clark) (2002-07-02)
Re: regular expression operators in CF grammars joachim_d@gmx.de (Joachim Durchholz) (2002-07-02)
Re: regular expression operators in CF grammars soenke.kannapinn@wincor-nixdorf.com (=?Windows-1252?Q?S=F6nke_Kannapinn?=) (2002-07-04)
Re: regular expression operators in CF grammars parsersinc@yahoo.com (SLK Parsers) (2002-07-04)
Re: regular expression operators in CF grammars rboland@unb.ca (Ralph Boland) (2002-07-04)
Re: regular expression operators in CF grammars cfc@world.std.com (Chris F Clark) (2002-07-15)
Re: regular expression operators in CF grammars kgw-news@stiscan.com (2002-07-21)
Re: regular expression operators in CF grammars nospam@snafu.de (Sönke Kannapinn) (2002-08-10)
Re: regular expression operators in CF grammars nospam@snafu.de (Sönke Kannapinn) (2002-08-10)
Re: regular expression operators in CF grammars cfc@shell01.TheWorld.com (Chris F Clark) (2002-08-23)
Re: regular expression operators in CF grammars tbandrow@unitedsoftworks.com (tj bandrowsky) (2002-09-03)
Re: regular expression operators in CF grammars cfc@shell01.TheWorld.com (Chris F Clark) (2002-09-12)
Re: regular expression operators in CF grammars cfc@TheWorld.com (Chris F Clark) (2002-09-14)
| List of all articles for this month |

From: kgw-news@stiscan.com
Newsgroups: comp.compilers
Date: 21 Jul 2002 02:11:35 -0400
Organization: Solution Technology
References: 02-05-142 02-05-151 02-06-024 02-06-056 02-06-090 02-07-010 02-07-025 02-07-043
Keywords: parse
Posted-Date: 21 Jul 2002 02:11:35 EDT

On Tue, 16 Jul 2002 03:44:57 UTC, "Chris F Clark" <cfc@world.std.com>
wrote:


[...]
> Sönke Kannapinn writes in our on-going discussion:
>
>
> The problem is this grammar isn't LR(k) for any k. Why not? Because,
> one needs unbounded (aka infinite) lookahead to distinguish between
> the id of a simple_func and the obj_id of an obj_func. You have to
> look past the left paren and into the parameter list and see whether
> it contains an "=" or a "," or is missing. The solution that seemed
> best was to use regular expressions (relying on their ambiguity) and
> replace the following two rules:
>
[...]


I happen to know of a thesis many years ago which had an interesting
approach to extending parsing. I will try to relate the main idea
with missing details.


It is possible to take a BNF which includes the lexical definitions
and extract by program something similar to the lexical scanner. This
is doen by observing these productions apply everywhere with one or
two character(token) lookhead. Reserved words are special cased as
they are in most lexical parsers.


You generate a finite state machine starting with each production as
long as it is unambiguous with all the others. The thesis actually
was implemented but I do not know the details here.


The productions recognized become the terminals in the next level
grammer and the details of the productions are removed from that
grammer. Notice that this can be done with SLR(0)-SLR(1) techniques.
Any token not processed are passed on as input to the next level
parser.


This can be done repeatedly giving a heirarchy of parsers. This
extends the lookahead somewhat at each level.


Then it is noticed that if you make the pass after the lexer SRL(1),
that if instead of substituting the LHS for the production you just
drop a new token into the grammer that indicates what it has seen.
Now you have infinite lookhead.


You can alternate these as many times as you like but usually one
extra right to left FSM pass after the lexer is all you need. More
passes would indicate the language was poorly designed.


Of couse you could generalize and do the complete LR(k)/RL(k)
parse at each level but that seemed to be overkill.


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.