Re: lexing backwards

"Stefan Monnier" <monnier+comp.compilers/news/@rum.cs.yale.edu>
15 Apr 2003 00:15:03 -0400

          From comp.compilers

Related articles
lexing backwards monnier+comp.compilers/news/@rum.cs.yale.edu (Stefan Monnier) (2003-04-05)
Re: lexing backwards haberg@math.su.se (2003-04-07)
Re: lexing backwards cfc@world.std.com (Chris F Clark) (2003-04-07)
Re: lexing backwards maratb@cs.berkeley.edu (Marat Boshernitsan) (2003-04-07)
Re: lexing backwards stan@zaborowski.org (Stan Zaborowski) (2003-04-13)
Re: lexing backwards Ron@Profit-Master.com (Ron Pinkas) (2003-04-13)
Re: lexing backwards monnier+comp.compilers/news/@rum.cs.yale.edu (Stefan Monnier) (2003-04-15)
Re: lexing backwards cfc@TheWorld.com (Chris F Clark) (2003-04-15)
Re: lexing backwards genew@mail.ocis.net (2003-05-06)
Re: lexing backwards Ron@Profit-Master.com (Ron Pinkas) (2003-05-14)
Re: lexing backwards Ron@Profit-Master.com (Ron Pinkas) (2003-05-16)
Re: lexing backwards genew@mail.ocis.net (2003-05-16)
Re: lexing backwards Ron@Profit-Master.com (Ron Pinkas) (2003-05-18)
[1 later articles]
| List of all articles for this month |

From: "Stefan Monnier" <monnier+comp.compilers/news/@rum.cs.yale.edu>
Newsgroups: comp.compilers
Date: 15 Apr 2003 00:15:03 -0400
Organization: Compilers Central
References: 03-04-015 03-04-026 03-04-029
Keywords: lex
Posted-Date: 15 Apr 2003 00:15:02 EDT

> Chris, I was in total agreement with you until I got to thinking about
> comments. And in particular I am thinking about languages that do not
> have a closing comment delimiter but use end-of-line as the closing
> delimiter. Examples would be "//" in C++ and "#" in Perl.


Actually, w.r.t. parsing comments backwards, Emacs already has pretty
good code for it. It works almost 100% and "rarely" needs to parse
from the beginning of file.


But yes, it's pretty tricky code (i.e. even though I've spent a lot of
time understanding, extending, and fixing the code, I'm not confident
at all that the one known error (left in for performance/lazyness
reasons) is the only one).


As for the error left in is for cases like


                // this is a funny C++ comment /* followed by
                normal code with a funny */ token.


in such a case, when lexing backwards, the */ will be taken for
a C style comment whereas it should of course be lexed as two tokens.


> So you would have to parse back to the beginning of the program before
> you can decide that a single quote mark is part of a comment and not a
> string delimiter.


But if there's no string quote in the comment, you don't need to parse
from the beginning of the file to know that it's really a comment
(although it could also be a comment-inside-a-string, but in that case
it means you're lexing from inside a lex-element in which case you
can't expect to lex locally and get a correct answer anyway).


In any case, lexing Perl is already tremendously difficult going forward.


In any case, I'm probably more interested in the "easy" case, which is where
lex-elements are "simple". E.g. I'd like to lex backwards assuming that
things like /*, */, ", and ' are tokens (the handling of comments and
strings can be left to another layer).




                Stefan


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.