critique of scanner and parser generators

moss@cs.umass.edu (Eliot Moss)
3 Feb 92 14:12:38 GMT

          From comp.compilers

Related articles
critique of scanner and parser generators moss@cs.umass.edu (1992-02-03)
| List of all articles for this month |
Newsgroups: comp.software-eng,comp.object,comp.compilers
From: moss@cs.umass.edu (Eliot Moss)
In-Reply-To: objsys@netcom.com's message of 31 Jan 92 03:44:08 GMT
Keywords: OOP, LALR, design
Organization: Dept of Comp and Info Sci, Univ of Mass (Amherst)
References: <1992Jan31.034408.6889objsys@netcom.COM>
Date: 3 Feb 92 14:12:38 GMT

>>>>> On 31 Jan 92 03:44:08 GMT, objsys@netcom.com (Bob Hathaway) said:
> My experience with code generators is with parser and scanner generators,
> which I think provide an important lesson for anyone using code
> generators. Long ago, what was considered difficult to impossible is
> done now routinely in dual-level courses. But, they invented parser and
> scanner generators to simplify things at a time when this was considered
> very difficult. People such as Wirth molded grammars into LL(1), however
> the rest of the world molded their grammars into LALR(1) (on Unix anyway)
> to take advantage of these parser/scanner code generators. [...] But
> note something that slipped by almost the entire computer science and
> user community. These languages, which comprise almost all modern
> computer languages in use today (Ada, C, C++, Eiffel, Modula, Pascal,
> ...) were molded to what these tools provided, simple context free
> grammars (with hacks to accommodate slightly different grammars in C and
> C++). An interesting point is that these languages are all context
> sensitive, again hacked with semantics embedded into code. Some think
> attribute grammars changed this, they did not. They simply formalized
> the embedded code (semantics) and still use context free grammars.
> Herein then lies one of the most hidden and widespread mindset problems
> in all of computing today, as any natural language translation person can
> tell you, the fact that all of our language grammars are not designed to
> best suit us, but to fit into the mold provided by our tools, the parser
> and scanner generators! These tools are fairly fast, but since almost no
> one questions them and considering the time/man years spent on making
> them fast, this efficiency issue isn't really clear.


I would tend to agree that the speed of the tools (scanner and parser
generators) is not necessarily a big issue, though having them reasonably
fast certainly helps out the students in my compiler course. And
personally, I don't think scanner generators are that big a deal --
production compilers usually have hand written scanners for speed and/or
to handle quirks of the language. Since the language tends not to change
rapidly, this is generally acceptable.


It is certainly true that LL and LR parsing techniques have shaped
language design. You seem to feel that this is somehow a bad thing.
Personally, I think it probably leads to more uniformity in a language
design, which makes the language easier to write and to read, i.e., that
there are substantial software engineering benefits. Additionally, the LR
languages are the largest class of languages that can be parsed
deterministically without backtracking. While speed of a parser generator
may be a minor issue, speed of a compiler is rather more important, and
the linear cost of modern parsing contributes to that speed. I would also
argue that if a machine requires backtracking to disambiguate, then you're
probably taxing a human's cognitive abilities when reading the code, too.
LALR is a (small) step back from full LR; the difference probably has
little practical effect on language design.


Let me make one of my points more explicit: ultimately, in large programs,
readability is probably more important than writability. This argues for
simplicity and minimum possibilities for ambiguity. I am cross posting to
comp.compilers since this seems of relevant there, too.
--


J. Eliot B. Moss, Assistant Professor
Department of Computer Science
Lederle Graduate Research Center
University of Massachusetts
Amherst, MA 01003
(413) 545-4206, 545-1249 (fax); Moss@cs.umass.edu


--


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.