Re: LR-parser-based lexical analysis - does it work?

"Zack Weinberg" <zackw@panix.com>
20 Oct 2002 22:45:31 -0400

          From comp.compilers

Related articles
LR-parser-based lexical analysis - does it work? soenke.kannapinn@wincor-nixdorf.com (=?iso-8859-1?Q?S=F6nke_Kannapinn?=) (2002-10-13)
Re: LR-parser-based lexical analysis - does it work? cfc@shell01.TheWorld.com (Chris F Clark) (2002-10-18)
Re: LR-parser-based lexical analysis - does it work? vmakarov@redhat.com (Vladimir N. Makarov) (2002-10-18)
Re: LR-parser-based lexical analysis - does it work? vbdis@aol.com (VBDis) (2002-10-18)
Re: LR-parser-based lexical analysis - does it work? brian-l-smith@uiowa.edu (Brian Smith) (2002-10-18)
Re: LR-parser-based lexical analysis - does it work? grosch@cocolab.de (Josef Grosch) (2002-10-18)
Re: LR-parser-based lexical analysis - does it work? zackw@panix.com (Zack Weinberg) (2002-10-20)
| List of all articles for this month |

From: "Zack Weinberg" <zackw@panix.com>
Newsgroups: comp.compilers
Date: 20 Oct 2002 22:45:31 -0400
Organization: PANIX -- Public Access Networks Corp.
References: 02-10-030 02-10-052
Keywords: C, lex
Posted-Date: 20 Oct 2002 22:45:31 EDT

VBDis <vbdis@aol.com> writes:
>"=?iso-8859-1?Q?S=F6nke_Kannapinn?=" <soenke.kannapinn@wincor-nixdorf.com> schreibt:
>
>>* If it doesn't work: Where are the problems with it? Do you know
>>counter-examples of programming languages where one can't do
>>lexical analysis like that?
>>(I know of Pascal's '..' problem; are there other problem cases?)
>
>Currently I'm trying to construct an C scanner and parser, for cross
>compilation. The C specification mentions more than 3 steps of lexical
>processing, before tokens can be created. IMO the only practical
>solution here is a multi-level scanner, which does all substitutions
>before passing the characters to the next stage.


Not so; a single-pass lexer for C is quite possible, although somewhat
of a pain to implement (it could be made much easier with only trivial
adjustments to the language, but that is a rant for another forum).
Newer (>3.x) GCC has such a lexer ("cpplib").


>I also had some problems with the C preprocessor, which must know
>about escaped and non-escaped line ends in #define. Also in #define
>the leading '(' of an argument list must immediately follow the
>identifier, with no whitespace allowed in between. In #include I had
>problems with the <file> syntax, because '<' is an operator in other
>contexts (expressions), and the allowed characters in a path
>specification differ from other (literal, identifier) character sets.
>To me this looks like a context sensitive lexical grammar?


Yes, indeed.


zw


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.