Re: Re: yacc - rules of thumb for speed of parsing?

"Quinn Tyler Jackson" <qjackson@wave.home.com>
30 Jul 1998 23:12:28 -0400

          From comp.compilers

Related articles
Re: Re: yacc - rules of thumb for speed of parsing? qjackson@wave.home.com (Quinn Tyler Jackson) (1998-07-30)
| List of all articles for this month |

From: "Quinn Tyler Jackson" <qjackson@wave.home.com>
Newsgroups: comp.compilers
Date: 30 Jul 1998 23:12:28 -0400
Organization: Compilers Central
Keywords: parse, performance

>I agree with John's comment on action code, and generally you'll spend far
>more time scanning than parsing, too...
[...]
>[That last one's important -- if you're not careful, your lexer will
>be the slowest part of the entire compiler. -John]


This brought back fond memories....


When I first implemented Laleh's Pattern Matcher in C++, I was
concerned that most of the time would be spent in the lexer, but
profiling done on intial C++ implementations of LPM showed that no
less than 50% of the total match time was actually spent in ::new and
::delete, making and breaking typically tiny temporary copies of
strings. I was able to reduce this *considerably* by reimplementing
the CShuString class using a reference counted model (some strings are
copied many thousands of times over during deeply recursive patterns).
Although probably not as important for compiler implementation as for
pattern matching, I have also found that scanning strings against
patterns that don't match is significantly more costly than scanning
strings for patterns that do match somewhere - anywhere. It takes LPM
considerably longer to know it's missed than to know it's hit except
with simple literal patterns like ['dog'$


LPM is entirely object oriented - except for the lexical scan, which
uses a table-based FSM and switch statements. I had considered
reimplementing the lexical phase as OO, just to stick with the
paradigm, but when I queried Stroustrup on his opinion on doing this,
he mentioned early work on Beta that showed such a performance hit
with OO lexical scanning that I changed my mind.


I may go back to the notion of an object oriented lexical phase,
despite the performance issues, but only because my current thinking
is to rework LPM to allow it to find patterns in streams that may or
may not be made up of statically wide char's. (For instance,
bitstreams, streams where one 'char' is actually one node of a tree,
etc.)


--
Quinn Tyler Jackson


email: qjackson@wave.home.com
url: http://www.qtj.net/~quinn/
ftp: qtj.net
--


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.