Re: The case of directly-executable parsers

"momchil.velikov@gmail.com" <momchil.velikov@gmail.com>
30 Jan 2006 02:01:00 -0500

From comp.compilers

Related articles
The case of directly-executable parsers momchil.velikov@gmail.com (2006-01-28)
Re: The case of directly-executable parsers derek@_NOSPAM_knosof.co.uk (Derek M. Jones) (2006-01-28)
*Re: The case of directly-executable parsers momchil.velikov@gmail.com (momchil.velikov@gmail.com)* (2006-01-30)**
Re: The case of directly-executable parsers clint@0lsen.net (Clint Olsen) (2006-01-31)
Re: The case of directly-executable parsers rmathew@gmail.com (Ranjit Mathew) (2006-01-31)
Re: The case of directly-executable parsers 148f3wg02@sneakemail.com (Karsten Nyblad) (2006-01-31)
Re: The case of directly-executable parsers momchil.velikov@gmail.com (momchil.velikov@gmail.com) (2006-02-02)

| List of all articles for this month |

From:	"momchil.velikov@gmail.com" <momchil.velikov@gmail.com>
Newsgroups:	comp.compilers
Date:	30 Jan 2006 02:01:00 -0500
Organization:	http://groups.google.com
References:	06-01-104 06-01-109
Keywords:	parse, performance
Posted-Date:	30 Jan 2006 02:01:00 EST

Derek M. Jones wrote:
> Velco,
>
> > What could be the reason for this parser performing so poorly? Or for
> > the Bison parser performing so well? Has anyone got some recent
> > observations with directly-executable vs. table driven recognizers -
> > scanning (re2c?), parsing, tree-matching?
>
> The generated code contains lots of jumps. I am guessing that
> instruction pipeline never gets to crank up before it is flushed, also
> all those jumps must be very bad for lookahead prediction and I dread
> to think about the cache swapping that must be going on.
>
> In the "good old days" jumps did not have such a big impact on
> performance (well, at least on many processors). So I am guessing
> that the excessive number of jumps are skewing the performance
> characteristics, which did not happen on the processors used for the
> previous timing comparisons.

Yeah, that sounds reasonable. It probably also explains why on PIII
Xeon the speedup is a bit better.

> Also gcc does not seem to be consistent in optimizing those switch ()
> { case L: goto Q; ...} constructs. Sometimes it turns the switch into
> a nested-if, other times it populates the jump table with the
> destination address of the goto.

I have also implemented a kind of a "switch optimizer". After
removing the most frequent transition, the keys are separated in
"dense" ranges, where density threshold is a parameter (say, 0.33 or
0.5). Then a linear or a binary search is performed depending on the
number of ranges (say >= 8 binary search, otherwise - linear).

This did not result in any performance gain, although it did reduce the
code size by a decent amount.

> Have you tried using dynamic profiling information to get gcc to
> tune the generated code? That might result in some of the less
> frequently executed code moved some place where it won't keep
> filling the cache.

Yes, the results were obtained with gcc 4.0.3 or 4.2 as follows:
1) compilation with -march=<whatever> -O3 -fomit-frame-pointer
-fprofile-generate
2) one run on sample data
3) compilation with -march=<whatever> -O3 -fomit-frame-pointer
-fprofile-use
4) timing on the same data

~velco

Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.

Re: The case of directly-executable parsers

"momchil.velikov@gmail.com" <momchil.velikov@gmail.com>30 Jan 2006 02:01:00 -0500

"momchil.velikov@gmail.com" <momchil.velikov@gmail.com>
30 Jan 2006 02:01:00 -0500