|lexing backwards firstname.lastname@example.org (Stefan Monnier) (2003-04-05)|
|Re: lexing backwards email@example.com (2003-04-07)|
|Re: lexing backwards firstname.lastname@example.org (Chris F Clark) (2003-04-07)|
|Re: lexing backwards email@example.com (Marat Boshernitsan) (2003-04-07)|
|Re: lexing backwards firstname.lastname@example.org (Stan Zaborowski) (2003-04-13)|
|Re: lexing backwards Ron@Profit-Master.com (Ron Pinkas) (2003-04-13)|
|Re: lexing backwards email@example.com (Stefan Monnier) (2003-04-15)|
|Re: lexing backwards cfc@TheWorld.com (Chris F Clark) (2003-04-15)|
|Re: lexing backwards firstname.lastname@example.org (2003-05-06)|
|Re: lexing backwards Ron@Profit-Master.com (Ron Pinkas) (2003-05-14)|
|Re: lexing backwards Ron@Profit-Master.com (Ron Pinkas) (2003-05-16)|
|Re: lexing backwards email@example.com (2003-05-16)|
|Re: lexing backwards Ron@Profit-Master.com (Ron Pinkas) (2003-05-18)|
|[1 later articles]|
|From:||"Stefan Monnier" <firstname.lastname@example.org>|
|Date:||15 Apr 2003 00:15:03 -0400|
|References:||03-04-015 03-04-026 03-04-029|
|Posted-Date:||15 Apr 2003 00:15:02 EDT|
> Chris, I was in total agreement with you until I got to thinking about
> comments. And in particular I am thinking about languages that do not
> have a closing comment delimiter but use end-of-line as the closing
> delimiter. Examples would be "//" in C++ and "#" in Perl.
Actually, w.r.t. parsing comments backwards, Emacs already has pretty
good code for it. It works almost 100% and "rarely" needs to parse
from the beginning of file.
But yes, it's pretty tricky code (i.e. even though I've spent a lot of
time understanding, extending, and fixing the code, I'm not confident
at all that the one known error (left in for performance/lazyness
reasons) is the only one).
As for the error left in is for cases like
// this is a funny C++ comment /* followed by
normal code with a funny */ token.
in such a case, when lexing backwards, the */ will be taken for
a C style comment whereas it should of course be lexed as two tokens.
> So you would have to parse back to the beginning of the program before
> you can decide that a single quote mark is part of a comment and not a
> string delimiter.
But if there's no string quote in the comment, you don't need to parse
from the beginning of the file to know that it's really a comment
(although it could also be a comment-inside-a-string, but in that case
it means you're lexing from inside a lex-element in which case you
can't expect to lex locally and get a correct answer anyway).
In any case, lexing Perl is already tremendously difficult going forward.
In any case, I'm probably more interested in the "easy" case, which is where
lex-elements are "simple". E.g. I'd like to lex backwards assuming that
things like /*, */, ", and ' are tokens (the handling of comments and
strings can be left to another layer).
Return to the
Search the comp.compilers archives again.