Re: The case of directly-executable parsers

"momchil.velikov@gmail.com" <momchil.velikov@gmail.com>
30 Jan 2006 02:01:00 -0500

          From comp.compilers

Related articles
The case of directly-executable parsers momchil.velikov@gmail.com (2006-01-28)
Re: The case of directly-executable parsers derek@_NOSPAM_knosof.co.uk (Derek M. Jones) (2006-01-28)
Re: The case of directly-executable parsers momchil.velikov@gmail.com (momchil.velikov@gmail.com) (2006-01-30)
Re: The case of directly-executable parsers clint@0lsen.net (Clint Olsen) (2006-01-31)
Re: The case of directly-executable parsers rmathew@gmail.com (Ranjit Mathew) (2006-01-31)
Re: The case of directly-executable parsers 148f3wg02@sneakemail.com (Karsten Nyblad) (2006-01-31)
Re: The case of directly-executable parsers momchil.velikov@gmail.com (momchil.velikov@gmail.com) (2006-02-02)
| List of all articles for this month |
From: "momchil.velikov@gmail.com" <momchil.velikov@gmail.com>
Newsgroups: comp.compilers
Date: 30 Jan 2006 02:01:00 -0500
Organization: http://groups.google.com
References: 06-01-10406-01-109
Keywords: parse, performance
Posted-Date: 30 Jan 2006 02:01:00 EST

Derek M. Jones wrote:
> Velco,
>
> > What could be the reason for this parser performing so poorly? Or for
> > the Bison parser performing so well? Has anyone got some recent
> > observations with directly-executable vs. table driven recognizers -
> > scanning (re2c?), parsing, tree-matching?
>
> The generated code contains lots of jumps. I am guessing that
> instruction pipeline never gets to crank up before it is flushed, also
> all those jumps must be very bad for lookahead prediction and I dread
> to think about the cache swapping that must be going on.
>
> In the "good old days" jumps did not have such a big impact on
> performance (well, at least on many processors). So I am guessing
> that the excessive number of jumps are skewing the performance
> characteristics, which did not happen on the processors used for the
> previous timing comparisons.


Yeah, that sounds reasonable. It probably also explains why on PIII
Xeon the speedup is a bit better.


> Also gcc does not seem to be consistent in optimizing those switch ()
> { case L: goto Q; ...} constructs. Sometimes it turns the switch into
> a nested-if, other times it populates the jump table with the
> destination address of the goto.


I have also implemented a kind of a "switch optimizer". After
removing the most frequent transition, the keys are separated in
"dense" ranges, where density threshold is a parameter (say, 0.33 or
0.5). Then a linear or a binary search is performed depending on the
number of ranges (say >= 8 binary search, otherwise - linear).


This did not result in any performance gain, although it did reduce the
code size by a decent amount.


> Have you tried using dynamic profiling information to get gcc to
> tune the generated code? That might result in some of the less
> frequently executed code moved some place where it won't keep
> filling the cache.


Yes, the results were obtained with gcc 4.0.3 or 4.2 as follows:
1) compilation with -march=<whatever> -O3 -fomit-frame-pointer
-fprofile-generate
2) one run on sample data
3) compilation with -march=<whatever> -O3 -fomit-frame-pointer
-fprofile-use
4) timing on the same data


~velco


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.