Re: lexing backwards

haberg@math.su.se (Hans Aberg)
7 Apr 2003 00:22:51 -0400

          From comp.compilers

Related articles
lexing backwards monnier+comp.compilers/news/@rum.cs.yale.edu (Stefan Monnier) (2003-04-05)
Re: lexing backwards haberg@math.su.se (2003-04-07)
Re: lexing backwards cfc@world.std.com (Chris F Clark) (2003-04-07)
Re: lexing backwards maratb@cs.berkeley.edu (Marat Boshernitsan) (2003-04-07)
Re: lexing backwards stan@zaborowski.org (Stan Zaborowski) (2003-04-13)
Re: lexing backwards Ron@Profit-Master.com (Ron Pinkas) (2003-04-13)
Re: lexing backwards monnier+comp.compilers/news/@rum.cs.yale.edu (Stefan Monnier) (2003-04-15)
Re: lexing backwards cfc@TheWorld.com (Chris F Clark) (2003-04-15)
[6 later articles]
| List of all articles for this month |

From: haberg@math.su.se (Hans Aberg)
Newsgroups: comp.compilers
Date: 7 Apr 2003 00:22:51 -0400
Organization: Mathematics
References: 03-04-015
Keywords: lex
Posted-Date: 07 Apr 2003 00:22:51 EDT

"Stefan Monnier" <monnier+comp.compilers/news/@rum.cs.yale.edu> wrote:


>Could anyone point me to work and experience on lexing text locally
>and backwards ?
>
>Traditionally lexing is done left-to-right with longest-match regexps.


Actually, it should have been termed "forward" and not left-to-right,
as languages that are written right-to-left are parsed the same way as
those written left-to-right: A language has an internal direction
"forward" which is independent on how it is written. Thus one can
easily mix languages internally that externally are written in
different directions. Knuth who invented this terminology "Lx"
parsing methods, evidently missed this point. It is a problem also in
the program TeX he wrote, that does not handle languages written
right-to-left. :-)


>But such a lexing system obviously means that if you're in the middle
>of a file and want to know what is the previous lexing token, you end
>up (in the general case) having to lex from the beginning of the file.
>
>Assuming one wants to be able to do such backward lexing and that one
>wants to get the answer locally (without having to go back to the
>beginning of the file), one needs to put some restrictions on the kind
>of tokens (use shortest-match regexp, or use a subset of regexps) one
>might accept, or one might just want to add some heuristics, or else
>use caching, or ...


I think that you should be able to mirror reverse the regular
expression itself, and then us that to construct a NFA/DFA/hybrid for
scanning backwards. The problem is when one tries to find the longest
match, scanning on lines etc., and I do not see exactly what happens
then.


>PS: This is in the context of basic Emacs support for major modes, so it
> is not specific to a particular grammar although it's OK if it
> does not cover all possible language syntaxes.


This kind of simple matches are often restricted to one line at a
time. Then it is probably simpler to scan backwards to successively
find each line, and then apply the normal, forward regular expression
to that.


    Hans Aberg * Anti-spam: remove "remove." from email address.
                                    * Email: Hans Aberg <remove.haberg@member.ams.org>
                                    * Home Page: <http://www.math.su.se/~haberg/>
                                    * AMS member listing: <http://www.ams.org/cml/>


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.