Related articles |
---|
[10 earlier articles] |
Re: Best tools for writing an assembler? bc@freeuk.com (BartC) (2014-02-22) |
Re: Best tools for writing an assembler? noitalmost@cox.net (noitalmost) (2014-02-23) |
Re: Best tools for writing an assembler? sebastien.fricker@gmail.com (=?ISO-8859-1?Q?S=E9bastien_Fricker?=) (2014-02-24) |
Re: Best tools for writing an assembler? gah@ugcs.caltech.edu (glen herrmannsfeldt) (2014-02-24) |
Re: Best tools for writing an assembler? gah@ugcs.caltech.edu (glen herrmannsfeldt) (2014-02-24) |
Re: Best tools for writing an assembler? lkrupp@pssw.com (Louis Krupp) (2014-02-24) |
Re: Best tools for writing an assembler? ivan@ootbcomp.com (Ivan Godard) (2014-02-24) |
Re: Best tools for writing an assembler? ivan@ootbcomp.com (Ivan Godard) (2014-02-24) |
Re: Best tools for writing an assembler? james.harris.1@gmail.com (James Harris) (2014-02-24) |
Re: Best tools for writing an assembler? hu47121@usenet.kitty.sub.org (2014-02-25) |
Re: Best tools for writing an assembler? DrDiettrich1@aol.com (Hans-Peter Diettrich) (2014-02-25) |
Re: Best tools for writing an assembler? DrDiettrich1@aol.com (Hans-Peter Diettrich) (2014-02-25) |
Re: Best tools for writing an assembler? rpw3@rpw3.org (2014-02-25) |
[5 later articles] |
From: | Ivan Godard <ivan@ootbcomp.com> |
Newsgroups: | comp.compilers |
Date: | Mon, 24 Feb 2014 15:04:49 -0800 |
Organization: | A noiseless patient Spider |
References: | 14-02-018 14-02-021 14-02-023 14-02-030 14-02-031 |
Keywords: | assembler |
Posted-Date: | 24 Feb 2014 22:13:15 EST |
> Sorry, but even if Flex/Bison are difficult to setup, writing a parser
> by hand is not a good way to work.
> Tools like Flex/Bison give several advantages that you cannot cover by
> writing a parser by hand: ...
> In other words, saying that writing parser by hand is better is the same
> as saying "why using object oriented programming? I can do the same in C
> with struct!" Yes, but the result is less readable, the compiler does
> does less verifications.
> [Yacc or bison provides the pleasant assurance that the langauge your
> program parses is exactly the one in the grammar, and not some
> superset due to missing error checks. -John]
Well, not really.
You see, there is more to a parser (and a lexer; you seem to be
conflating the two) than parsing. It is rather rare for a parser to
receive a syntactically correct program as input; errors are more
common than correct code. So the parser has not only to recognize
correct sentences, it must do something useful with incorrect ones.
"Syntax error: right-reduction stack underflow" is not useful.
You can avoid the problem by introducing error states into your
grammar. Unfortunately, it turns out that LR and LALR parsing is not
the way natural languages work (German possibly excepted), and a
reasonably complete set of useful error states produces a hopelessly
unmaintainable generated parser, or blows the generation.
Programming languages, even assembler, have always tended to be LL(1)
to a first approximation. As a result, production compilers intended
for use by those unfamiliar with the internals have tended to be LL(1)
too, and usually recursive descent. The advantage of RD is that the
semantics can be done on the fly, and the semantics can be used to
guide the parse when the actual language is not a CFG (and most of
them are not, to at least some degree) without exponential explosion
of state.
In contrast, a parser-generator only gets you a syntax tree. You still
must walk the tree doing semantics and real work - and guess what,
that tree walker is the same code you had to write for a RD parser,
but the RD parser didn't require you to deal with an inhumane tool. In
short, YACC (and Bison) looked like a good idea, and are in fact
usable for toy languages and student exercises, but don't use them for
real work.
That's for parsers. Lexers are a different matter. First rule: never
use a parser for what a lexer would do. Your language description (for
the parser) should never take terminal productions to the character
level. Think about what your combined-lex-and-parse grammar looks
like if you want it to accept UTF16 or Danish keyboards - again,
issues that don't come up in toys, but matter very much in real
software.
Secondly: as a rough rule of thumb, of all the work done by a compiler
at -O0, half will be in the lexer - it's the only place where work is
done character-by-character and not at a coarser granularity like
lexeme; there are *many* more characters in a program than lexemes.
Consequently efficiency of execution matters in the lexer, and
generated lexers of the Flex sort are very inefficient. Sure, the
state transitions look efficient, but you haven't thought about cache
behavior, nor about copy costs, symbol table look-up and so on. If
your lexer isn't doing that work too then all it is is a keyword
detector, whereas it should be very much more.
A good lexer will look at each character exactly once to do everything
including to isolate the containing lexeme, to find a symbol table
entry for it (including program declared symbols, not just keywords).
and to enter the symbol in the symbol table if it's not been seen
before. Once! Kudos if your lexer, on encountering a bogus lexeme,
can find the nearest misspelling currently in the symbol table, emit a
diagnostic, and proceed with the assumed correction.
You may feel that modern machines are fast enough that lexer (and
compiler) performance doesn't matter. If so you are right, for toys;
toys in this context include programs with < 100KLOC. Real programs,
needing real compilers, start at > 1MLOC and go up quite a ways from
there. And only toys can use separate compilation to avoid compile
time; production test and verification recompile the World from
scratch source. Frequently. Of course, test and verification are not
common in toys.
Just my opinion, mind you, albeit an opinion based on 11 compilers I
have done over the years, for a variety of languages and targets. Your
mileage may vary.
Return to the
comp.compilers page.
Search the
comp.compilers archives again.