Re: Buffered input for a lexer?

Chris F Clark <cfc@world.std.com>
25 Mar 2002 01:15:40 -0500

From comp.compilers

Related articles
Buffered input for a lexer? sabre@nondot.org (Chris Lattner) (2002-03-24)
Re: Buffered input for a lexer? zackw@panix.com (Zack Weinberg) (2002-03-24)
Buffered input for a lexer? cfc@world.std.com (Chris F Clark) (2002-03-24)
Re: Buffered input for a lexer? sabre@nondot.org (Chris Lattner) (2002-03-24)
Re: Buffered input for a lexer? sabre@nondot.org (Chris Lattner) (2002-03-24)
Re: Buffered input for a lexer? rhyde@cs.ucr.edu (Randall Hyde) (2002-03-25)
*Re: Buffered input for a lexer? cfc@world.std.com (Chris F Clark)* (2002-03-25)**
Re: Buffered input for a lexer? clint@0lsen.net (2002-03-31)
Re: Buffered input for a lexer? sabre@nondot.org (Chris Lattner) (2002-03-31)
Re: Buffered input for a lexer? sabre@nondot.org (Chris Lattner) (2002-03-31)
Re: Buffered input for a lexer? joachim_d@gmx.de (Joachim Durchholz) (2002-03-31)
Re: Buffered input for a lexer? cgweav@aol.com (2002-03-31)
Re: Buffered input for a lexer? bear@sonic.net (Ray Dillinger) (2002-04-10)
[11 later articles]

| List of all articles for this month |

From:	Chris F Clark <cfc@world.std.com>
Newsgroups:	comp.compilers
Date:	25 Mar 2002 01:15:40 -0500
Organization:	Compilers Central
References:	02-03-162
Keywords:	lex, design
Posted-Date:	25 Mar 2002 01:15:40 EST

Chris Lattner further asked:
> . . . , but you
> bring up an interesting point that I have been wondering about: what is
> the advantage of the push style lexer/parser over the typical pull style
> lexer? I would think that there would be performance implications of
> having a push style lexer, and that most common uses of lexers don't
> require a push style interface...

The performance impact of a push v. a pull style lexer is negligible.
It is below the threshold of what we've been able to ever measure. We
occassionally benchmark Yacc++ lexers v. Flex ones. Generally, the
results are indistinguishable. (Recently, one of our customers found
a case that wasn't true. We found it had to do with the sentinel
code, which is why we changed our lexer to merge techniques 1 & 3.
That wiped out the performance deficit and in most cases made our
lexers slightly faster.) Note, our original end-of-buffer code was
essentially your technique 3, but because of other factors in how the
end of buffer was checked, it was not a major performance issue.

> Given your practical experience with them, what would you say are the
> strengths and weaknesses of the approach?

We originally did push style lexers because we wanted to use them in
window-system style applications--i.e. those things driven by one big
event loop. However, that turned out not to be a major driving
factor and at best a small niche in the uses of our tool.

Because we knew that most programmers are more familiar with the pull
style interface, we wrap our push driven model in a fake-pull driven
shell. Thus, while the internals of the system are turned upside
down, the input object drives the lexer object which drives the parser
object which drives the AST objects, to the casual user it is all done
in the guise of one call to "yy_psr()".

There are some interesting things one can do with a push style
interface that can't be done (easily) with a pull style interface,
especially if one thinks in terms of co-routine programming. However,
most of them are at the level of parlor tricks and not that important
in real applications.

The one notable exception to that are the people using our system to
do real event-driven state machine models. They need the push model
and do some very interesting things with it. The push models combines
very well with objects to make little machines that you can
run/pause/save/restore/resume/.... all in parallel. One of our
customers reports creating thousands of objects running little state
machines to implement some networking protocol.

----------------------------------------------------------------------

More important are the details of the implementation. Those places
where our code is even subtly different than [f]lex and yacc/bison can
cause various hacks that people have developed over the years to fail.
Some of the hacks we have "better" solutions for, but it is still a
nuiscance. Of course, we feel that you do get something in exchange
for losing some of those hacks (and presumably so do our customers).
Still, our happiest customers are not those that started with some
legacy lex-and-yacc application, but ones that were building something
from scratch using an o-o/C++ mindset.

I believe it was Guy Steele who said, "Good is the enemy of better."
Yacc and lex are the personification of that. It is real hard to
build a "better" pair of tools, because it is difficult to capture ALL
that is good in that pair.

Hope this helps,
-Chris

*****************************************************************************
Chris Clark Internet : compres@world.std.com
Compiler Resources, Inc. Web Site : http://world.std.com/~compres
3 Proctor Street voice : (508) 435-5016
Hopkinton, MA 01748 USA fax : (508) 435-4847 (24 hours)

Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.

Re: Buffered input for a lexer?

Chris F Clark <cfc@world.std.com>25 Mar 2002 01:15:40 -0500

Chris F Clark <cfc@world.std.com>
25 Mar 2002 01:15:40 -0500