Why separate the lexer and parser?

"Mark Hopkins" <mark@omnifest.uwm.edu>
Sun, 9 Oct 1994 17:06:43 GMT

          From comp.compilers

Related articles
Why separate Lexical & Parser Generators heronj@smtplink.NGC.COM (John Heron) (1994-10-05)
Why separate the lexer and parser? mark@omnifest.uwm.edu (Mark Hopkins) (1994-10-09)
Re: Why separate the lexer and parser? hbaker@netcom.com (1994-10-10)
Re: Why separate the lexer and parser? cg@myrias.ab.ca (1994-10-14)
Re: Why separate the lexer and parser? conway@munta.cs.mu.OZ.AU (1994-10-14)
| List of all articles for this month |
Newsgroups: comp.compilers
From: "Mark Hopkins" <mark@omnifest.uwm.edu>
Keywords: lex, yacc, design, comment
Organization: Compilers Central
References: 94-10-028
Date: Sun, 9 Oct 1994 17:06:43 GMT

      Generally you'll keep the lexical scanner separate in order to
modularise error-handling. Lexical errors are trapped and handled before
the rest of the processor has a chance to see it, so the syntax analyser
only needs to deal with a clean and consistent interface. In most
imperative programming languages you can pretty much do the same thing
with the syntax for expressions, for declarations and for statements.
You'll run into a few cases like in C where labeled statements have to be
distinguished from assignment statements (you have to look ahead), but
that's no major deal.


      Otherwise there's no real reason to keep the separate. A syntax is
just a syntax. The moderator's note to the contrary is confuted by his
implementation of a Fortran subset. Indeed, in this case, the Fortran 77
syntax IS specified as one monolithic whole and attempts to separate out
the lexical part are only going to create needless difficulties down the
line.


    (That may also be interpreted as a case-in-point argument for designing
languages that allow the easy separation of the lexical scanner).
[Well, in fact, my Fortran subset parser does use yacc. Works pretty well.
There's quite a lot of lexical feedback from the parser to the lexer (which
is not written in lex, of course), but yacc is a lot easier to use than the
ad-hoc alternative. Fifteen years ago I wrote an entire Fortran 77 compiler
(called INfort, for anyone who remembers it) so I know that the technique
really does work for parsing all of F77. -John]
--


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.