Re: Context sensitive scanner ?

Chris F Clark <cfc@world.std.com>
28 Nov 1997 01:01:05 -0500

From comp.compilers

Related articles
Context sensitive scanner ? hat@se-46.wpa.wtb.tue.nl (Albert Theo Hofkamp) (1997-11-20)
Re: Context sensitive scanner ? johnm@non.net (1997-11-23)
Re: Context sensitive scanner ? pjj@cs.man.ac.uk (1997-11-23)
Re: Context sensitive scanner ? Mikael.Pettersson@sophia.inria.fr (Mikael Pettersson) (1997-11-23)
Re: Context sensitive scanner ? genew@vip.net (1997-11-23)
Re: Context sensitive scanner ? thetick@magelang.com (Scott Stanchfield) (1997-11-24)
*Re: Context sensitive scanner ? cfc@world.std.com (Chris F Clark)* (1997-11-28)**
Re: Context sensitive scanner ? henry@zoo.toronto.edu (Henry Spencer) (1997-11-28)
Re: Context sensitive scanner ? ok@cs.rmit.edu.au (1997-11-29)
Re: Context sensitive scanner ? hat@se-46.wpa.wtb.tue.nl (Albert Theo Hofkamp) (1997-11-29)
Re: Context sensitive scanner ? thetick@magelang.com (Scott Stanchfield) (1997-11-30)
Re: Context sensitive scanner ? johnm@non.net (1997-11-30)
Re: Context sensitive scanner ? thetick@magelang.com (Scott Stanchfield) (1997-11-30)
[3 later articles]

| List of all articles for this month |

From:	Chris F Clark <cfc@world.std.com>
Newsgroups:	comp.compilers
Date:	28 Nov 1997 01:01:05 -0500
Organization:	The World Public Access UNIX, Brookline, MA
References:	97-11-117 97-11-127 97-11-141
Keywords:	lex

Mikael Pettersson <Mikael.Pettersson@sophia.inria.fr> wrote:
> Back in 1988, I wrote a compiler (a translator actually) for a proprietary
> Basic dialect, . . . had some "interesting" features:
. . . [details of context sensitivity elided]
> Now, one could imagine some awful hack, adding persistent (i.e., C
> "static") . . . , but it was an unmaintainable mess that never
> really worked well.

and then he described his solution: [which Scott Stanchfield seconded]
> I eventually recast the problem as a stack of coroutines
> processing streams of tokens. That is:
> 1. A simple low-level scanner generates a stream of tokens.
> 2. A "transformation" scanner is a coroutine with an input stream
> and an output stream. It steps through its input stream
> looking for its particular kind of transformation. . . .
> 3. Push one transformation scanner for each individual kind
> of transformation. Keep each scanner as simple as possible.
> 4. The yacc-generated parser gets its input tokens from the
> output stream of the top-most transformation scanner.

Re-scanning the token stream with a series of pattern matchers is a
very general, powerful, and clean solution to most context parsing
problems. More people should be aware of it and use it. You can see
rudiments of it reappearing frequently. For example, lisp macros are
described as transformational grammars. The C preprocessor can simply
be thought of as a transformation processor on the token stream. More
importantly, the C standard defines many of the early phases as a
sequence of transformations.

Frank DeRemer cannonicalized at least one set of transformations when
he recommended splitting the lexing process into the scanner
(recognizing tokens) and the screener (distinguishing keywords).

In addition, there was a recent discussion here about recasting all
parsing in transformational terms, which introduces its own set of
problems by oversimplifying a different set of issues, but that does
not detract from the usefulness of the model when used to solve the
problems it does address.

Unfortunately, most parsing systems don't encourage thinking about
parsing as sequences of transformations. Instead they promote the
view of one lexer and one parser.

In Yacc++, Barbara and I tried to extend that model by introducing an
"ignore" construct which could be used to describe phrases the parser
was supposed to treat as preprocessor directives. We also tried to
modularize the parser so that it could insert tokens into the stream
it was reading. However, those features are not sufficient to break
the conceptual stranglehold, and Yacc++ grammars still tend to be "two
pass" (lexer/parser) oriented, although occassionally we see users
using the features in a transformational way. (Of course, I must
admit that our documentation does not exactly push transformational
grammars as "the solution" either, since it is just one of many good
techniques. In addition, some of the details of using Yacc++ trans-
formationally are probably messy, since it isn't a part of the product
which has been stressed.)

I now think that having a separate description of the extra pass(es)
would be a better solution, at least for some cases. For one thing,
it would make the concept of transformations more transparent.
Someday we will add that feature to Yacc++.

However, transformational passes have a price and there are issues to
consider, two of which come to immediate mind.

1) If a transformational pass is implemented as an actual separate
pass over the token stream there are performance issues to consider,
as the additional pass adds overhead. On the other hand, merging
passes must be done carefully to preserve the correct semantics.
(The AZ filter fusion work looks interesting in this regard.)

2) Some transformational passes must be run serially, others
recursively. The specification language needs to be able to define
ways to describe a variety of pass orderings. Ansi C macros are an
excellent example of this. Macros should be applied recursively
except that a macro which is already applied is marked blue and does
not reapply to itself.

One thing worth noting about both issues, is that they stress the
importance of understanding the semantics of applying transformations
consecutively (composing them in the mathematical sense). Sometimes
the transfromations interact to give surprising results. At other
times, you want the tranfromations to interact and a simple
implementation does not produce the correct combined effects. Getting
the right transformations in the right order and having them interact
in the right ways can require great subtlety. This is particularly
true when you are attempting to do so "efficiently".

Hope this helps,
-Chris

*****************************************************************************
Chris Clark Internet : compres@world.std.com
Compiler Resources, Inc. CompuServe : 74252,1375
3 Proctor Street voice : (508) 435-5016
Hopkinton, MA 01748 USA fax : (508) 435-4847 (24 hours)
------------------------------------------------------------------------------
Web Site in Progress: Web Site : http://world.std.com/~compres
--

Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.

Re: Context sensitive scanner ?

Chris F Clark <cfc@world.std.com>28 Nov 1997 01:01:05 -0500

Chris F Clark <cfc@world.std.com>
28 Nov 1997 01:01:05 -0500