Related articles |
---|
lexing backwards monnier+comp.compilers/news/@rum.cs.yale.edu (Stefan Monnier) (2003-04-05) |
Re: lexing backwards haberg@math.su.se (2003-04-07) |
Re: lexing backwards cfc@world.std.com (Chris F Clark) (2003-04-07) |
Re: lexing backwards maratb@cs.berkeley.edu (Marat Boshernitsan) (2003-04-07) |
Re: lexing backwards stan@zaborowski.org (Stan Zaborowski) (2003-04-13) |
Re: lexing backwards Ron@Profit-Master.com (Ron Pinkas) (2003-04-13) |
Re: lexing backwards monnier+comp.compilers/news/@rum.cs.yale.edu (Stefan Monnier) (2003-04-15) |
Re: lexing backwards cfc@TheWorld.com (Chris F Clark) (2003-04-15) |
Re: lexing backwards genew@mail.ocis.net (2003-05-06) |
Re: lexing backwards Ron@Profit-Master.com (Ron Pinkas) (2003-05-14) |
Re: lexing backwards Ron@Profit-Master.com (Ron Pinkas) (2003-05-16) |
Re: lexing backwards genew@mail.ocis.net (2003-05-16) |
Re: lexing backwards Ron@Profit-Master.com (Ron Pinkas) (2003-05-18) |
[1 later articles] |
From: | "Stefan Monnier" <monnier+comp.compilers/news/@rum.cs.yale.edu> |
Newsgroups: | comp.compilers |
Date: | 15 Apr 2003 00:15:03 -0400 |
Organization: | Compilers Central |
References: | 03-04-015 03-04-026 03-04-029 |
Keywords: | lex |
Posted-Date: | 15 Apr 2003 00:15:02 EDT |
> Chris, I was in total agreement with you until I got to thinking about
> comments. And in particular I am thinking about languages that do not
> have a closing comment delimiter but use end-of-line as the closing
> delimiter. Examples would be "//" in C++ and "#" in Perl.
Actually, w.r.t. parsing comments backwards, Emacs already has pretty
good code for it. It works almost 100% and "rarely" needs to parse
from the beginning of file.
But yes, it's pretty tricky code (i.e. even though I've spent a lot of
time understanding, extending, and fixing the code, I'm not confident
at all that the one known error (left in for performance/lazyness
reasons) is the only one).
As for the error left in is for cases like
// this is a funny C++ comment /* followed by
normal code with a funny */ token.
in such a case, when lexing backwards, the */ will be taken for
a C style comment whereas it should of course be lexed as two tokens.
> So you would have to parse back to the beginning of the program before
> you can decide that a single quote mark is part of a comment and not a
> string delimiter.
But if there's no string quote in the comment, you don't need to parse
from the beginning of the file to know that it's really a comment
(although it could also be a comment-inside-a-string, but in that case
it means you're lexing from inside a lex-element in which case you
can't expect to lex locally and get a correct answer anyway).
In any case, lexing Perl is already tremendously difficult going forward.
In any case, I'm probably more interested in the "easy" case, which is where
lex-elements are "simple". E.g. I'd like to lex backwards assuming that
things like /*, */, ", and ' are tokens (the handling of comments and
strings can be left to another layer).
Stefan
Return to the
comp.compilers page.
Search the
comp.compilers archives again.