|LR-parser-based lexical analysis - does it work? email@example.com (=?iso-8859-1?Q?S=F6nke_Kannapinn?=) (2002-10-13)|
|Re: LR-parser-based lexical analysis - does it work? cfc@shell01.TheWorld.com (Chris F Clark) (2002-10-18)|
|Re: LR-parser-based lexical analysis - does it work? firstname.lastname@example.org (Vladimir N. Makarov) (2002-10-18)|
|Re: LR-parser-based lexical analysis - does it work? email@example.com (VBDis) (2002-10-18)|
|Re: LR-parser-based lexical analysis - does it work? firstname.lastname@example.org (Brian Smith) (2002-10-18)|
|Re: LR-parser-based lexical analysis - does it work? email@example.com (Josef Grosch) (2002-10-18)|
|Re: LR-parser-based lexical analysis - does it work? firstname.lastname@example.org (Zack Weinberg) (2002-10-20)|
|From:||"Zack Weinberg" <email@example.com>|
|Date:||20 Oct 2002 22:45:31 -0400|
|Organization:||PANIX -- Public Access Networks Corp.|
|Posted-Date:||20 Oct 2002 22:45:31 EDT|
VBDis <firstname.lastname@example.org> writes:
>"=?iso-8859-1?Q?S=F6nke_Kannapinn?=" <email@example.com> schreibt:
>>* If it doesn't work: Where are the problems with it? Do you know
>>counter-examples of programming languages where one can't do
>>lexical analysis like that?
>>(I know of Pascal's '..' problem; are there other problem cases?)
>Currently I'm trying to construct an C scanner and parser, for cross
>compilation. The C specification mentions more than 3 steps of lexical
>processing, before tokens can be created. IMO the only practical
>solution here is a multi-level scanner, which does all substitutions
>before passing the characters to the next stage.
Not so; a single-pass lexer for C is quite possible, although somewhat
of a pain to implement (it could be made much easier with only trivial
adjustments to the language, but that is a rant for another forum).
Newer (>3.x) GCC has such a lexer ("cpplib").
>I also had some problems with the C preprocessor, which must know
>about escaped and non-escaped line ends in #define. Also in #define
>the leading '(' of an argument list must immediately follow the
>identifier, with no whitespace allowed in between. In #include I had
>problems with the <file> syntax, because '<' is an operator in other
>contexts (expressions), and the allowed characters in a path
>specification differ from other (literal, identifier) character sets.
>To me this looks like a context sensitive lexical grammar?
Return to the
Search the comp.compilers archives again.