Related articles |
---|
[14 earlier articles] |
Re: is lex useful? Scott.Nicol@infoadvan.com (1996-06-27) |
Re: is lex useful? Scott.Nicol@infoadvan.com (1996-06-27) |
Re: is lex useful? 72510.2757@CompuServe.COM (Stephen Lindholm) (1996-06-27) |
Re: is lex useful? kanze@lts.sel.alcatel.de (1996-06-27) |
Re: is lex useful? bart@time.cirl.uoregon.edu (1996-06-30) |
Re: is lex useful? Robert.Corbett@Eng.Sun.COM (1996-06-30) |
Re: is lex useful? leichter@smarts.com (1996-06-30) |
Re: is lex useful? trd@murlibobo.cs.mu.OZ.AU (1996-06-30) |
Re: is lex useful? WStreett@shell.monmouth.com (1996-06-30) |
Re: is lex useful? dmr@bell-labs.com (1996-06-30) |
Re: is lex useful? clark@quarry.zk3.dec.com (1996-07-01) |
Re: is lex useful? bromage@cs.mu.OZ.AU (1996-07-02) |
Re: is lex useful? kanze@lts.sel.alcatel.de (1996-07-02) |
[2 later articles] |
From: | leichter@smarts.com (Jerry Leichter) |
Newsgroups: | comp.compilers |
Date: | 30 Jun 1996 16:43:42 -0400 |
Organization: | Compilers Central |
References: | 96-06-101 96-06-123 96-06-141 |
Keywords: | lex |
The real reason for using something like F?LEX to write your
lexical analyzer is that someone (maybe yourself) 5 or 10
years down the road, won't be looking at hand-carved code,
with a need to make a change/bug fix/etc., wondering, "Just
what the hell was the guy who wrote this trying to do?"
I hate to say it, but ... looking at your RE, I wouldn't know what you were
trying to do. In fact, I misread it at first and thought it would allow a
double underscore at the end.
(The RE was:
[A-Z]([A-Z0-9]|(_[A-Z0-9]))*(_)?
producing it requires re-interpreting the description - letter, followed by
letters, digits or underscores, but with no two consecutive underscores - as
"letter, followed by sequence of sub-pieces consisting of letters and digits
and separated by underscores". While the extension of the two descriptions
happens to be the same, the descriptions themselves are very different, and I
doubt anyone reading this RE would quickly produce the original rather
obvious English.)
RE's are not always a particularly nice way to describe things. Yes, you got
that one. Now try: Letter, followed by letters or digits or underscores,
but with no more than 10 underscores in a row. The C code for that is
essentially identical to the previous case. The RE is rather long....
I've always found RE's to be one of those "If all you have is a hammer,
every- thing looks like a nail" solutions. Yes, they can describe some
things very nicely and compactly. But they are completely inadequate for
describing many other kinds of things that, on the surface, seem just a
simple. (Nested comments are a classic example.)
Of course, it's easy to extend RE's to cover many particular oddball cases.
For example, a notation for "0 to n copies" - bounded closure - would be
convenient and easy to handle. (The resulting machine might get big in the
standard FSM implementation, but who cares?) However, the syntax of RE's -
which makes them so simple and compact - is hard to extend in this way
without losing its good features. And where to you stop? "1 to n copies"
is just as important as "0 to n copies", and after all we often have both *
and + operators.
For quicky one-off searches, RE's require minimal typing. You'll never look
at them again, so who cares how complicated they are? For cases where I'll
want to go back later and understand the thing, complex patterns are usually
much more clearly described using grammars - where you get to name
sub-pieces and re-use them; and there's no reason not to use some kind of
EBNF notation, which gives you back RE's on the right hand side of
productions for use when appropriate. Any decent parser generator will
produce something with a performance close to an FSM's, given a reasonable
grammar for a regular language.
But I'll still maintain that the simple cases are often at least as clear in
decent procedural code.
-- Jerry
--
Return to the
comp.compilers page.
Search the
comp.compilers archives again.