Related articles |
---|
The case of directly-executable parsers momchil.velikov@gmail.com (2006-01-28) |
Re: The case of directly-executable parsers derek@_NOSPAM_knosof.co.uk (Derek M. Jones) (2006-01-28) |
Re: The case of directly-executable parsers momchil.velikov@gmail.com (momchil.velikov@gmail.com) (2006-01-30) |
Re: The case of directly-executable parsers clint@0lsen.net (Clint Olsen) (2006-01-31) |
Re: The case of directly-executable parsers rmathew@gmail.com (Ranjit Mathew) (2006-01-31) |
Re: The case of directly-executable parsers 148f3wg02@sneakemail.com (Karsten Nyblad) (2006-01-31) |
Re: The case of directly-executable parsers momchil.velikov@gmail.com (momchil.velikov@gmail.com) (2006-02-02) |
From: | "momchil.velikov@gmail.com" <momchil.velikov@gmail.com> |
Newsgroups: | comp.compilers |
Date: | 30 Jan 2006 02:01:00 -0500 |
Organization: | http://groups.google.com |
References: | 06-01-10406-01-109 |
Keywords: | parse, performance |
Posted-Date: | 30 Jan 2006 02:01:00 EST |
Derek M. Jones wrote:
> Velco,
>
> > What could be the reason for this parser performing so poorly? Or for
> > the Bison parser performing so well? Has anyone got some recent
> > observations with directly-executable vs. table driven recognizers -
> > scanning (re2c?), parsing, tree-matching?
>
> The generated code contains lots of jumps. I am guessing that
> instruction pipeline never gets to crank up before it is flushed, also
> all those jumps must be very bad for lookahead prediction and I dread
> to think about the cache swapping that must be going on.
>
> In the "good old days" jumps did not have such a big impact on
> performance (well, at least on many processors). So I am guessing
> that the excessive number of jumps are skewing the performance
> characteristics, which did not happen on the processors used for the
> previous timing comparisons.
Yeah, that sounds reasonable. It probably also explains why on PIII
Xeon the speedup is a bit better.
> Also gcc does not seem to be consistent in optimizing those switch ()
> { case L: goto Q; ...} constructs. Sometimes it turns the switch into
> a nested-if, other times it populates the jump table with the
> destination address of the goto.
I have also implemented a kind of a "switch optimizer". After
removing the most frequent transition, the keys are separated in
"dense" ranges, where density threshold is a parameter (say, 0.33 or
0.5). Then a linear or a binary search is performed depending on the
number of ranges (say >= 8 binary search, otherwise - linear).
This did not result in any performance gain, although it did reduce the
code size by a decent amount.
> Have you tried using dynamic profiling information to get gcc to
> tune the generated code? That might result in some of the less
> frequently executed code moved some place where it won't keep
> filling the cache.
Yes, the results were obtained with gcc 4.0.3 or 4.2 as follows:
1) compilation with -march=<whatever> -O3 -fomit-frame-pointer
-fprofile-generate
2) one run on sample data
3) compilation with -march=<whatever> -O3 -fomit-frame-pointer
-fprofile-use
4) timing on the same data
~velco
Return to the
comp.compilers page.
Search the
comp.compilers archives again.