Re: LR-parser-based lexical analysis - does it work?

"Josef Grosch" <grosch@cocolab.de>
18 Oct 2002 23:25:15 -0400

          From comp.compilers

Related articles
LR-parser-based lexical analysis - does it work? soenke.kannapinn@wincor-nixdorf.com (=?iso-8859-1?Q?S=F6nke_Kannapinn?=) (2002-10-13)
Re: LR-parser-based lexical analysis - does it work? cfc@shell01.TheWorld.com (Chris F Clark) (2002-10-18)
Re: LR-parser-based lexical analysis - does it work? vmakarov@redhat.com (Vladimir N. Makarov) (2002-10-18)
Re: LR-parser-based lexical analysis - does it work? vbdis@aol.com (VBDis) (2002-10-18)
Re: LR-parser-based lexical analysis - does it work? brian-l-smith@uiowa.edu (Brian Smith) (2002-10-18)
Re: LR-parser-based lexical analysis - does it work? grosch@cocolab.de (Josef Grosch) (2002-10-18)
Re: LR-parser-based lexical analysis - does it work? zackw@panix.com (Zack Weinberg) (2002-10-20)
| List of all articles for this month |

From: "Josef Grosch" <grosch@cocolab.de>
Newsgroups: comp.compilers
Date: 18 Oct 2002 23:25:15 -0400
Organization: CoCoLab, Achern, Germany
Keywords: LR(1), lex
Posted-Date: 18 Oct 2002 23:25:15 EDT

> In a compiler generator project we are thinking about building
> scanners using LR parser techniques.


Having worked with scanner generators and attribute grammars for a
long time and for almost every language I would be very cautious with
LR parsing for lexical analysis. In my opinion a tool for scanner
generation should be at least as powerful as lex, flex, rex. You
almost always need:


- unbounded lookahead
- ambiguous specifications (e. g. keyword vs. identifier)
- right context (e. g. Pascal's '..' problem)
- start states
- loopholes allowing hand-written code


While I am using attribute grammars for parsing and semantic analysis
I never missed them for lexical analysis.


> (I know of Pascal's '..' problem; are there other problem cases?)


Virtually every language has similar problems as Pascal (Modula-2,
COBOL, PL/I). For example Ada has the "quote problem": How to scan
this input?


      t ' ( ' , ' , ' , ' , ' , ' )


The real challenges are for example the handling of strings in almost
all languages (escape sequences, continuation lines), the asm
statement in C++, EXEC SQL in COBOL, and the scanning of legacy
languages such as FORTRAN, JCL, or RPG.


--
Dr. Josef Grosch
Email: grosch@cocolab.de


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.