EQNTOTT Vectors of 16 bit Numbers [Was: Re: Yikes!!! New 200Mhz Intel P6 Benchmarks]

glew@ichips.intel.com (Andy Glew)
Thu, 9 Nov 1995 08:01:47 GMT

          From comp.compilers

Related articles
EQNTOTT Vectors of 16 bit Numbers [Was: Re: Yikes!!! New 200Mhz Intel glew@ichips.intel.com (1995-11-09)
Re: EQNTOTT Vectors of 16 bit Numbers [Was: Re: Yikes!!! New 200Mhz In pardo@cs.washington.edu (1995-11-14)
Re: EQNTOTT Vectors of 16 bit Numbers cdg@nullstone.com (1995-11-17)
Re: EQNTOTT Vectors of 16 bit Numbers hbaker@netcom.com (1995-11-19)
Re: EQNTOTT Vectors of 16 bit Numbers cliffc@ami.sps.mot.com (1995-11-20)
Re: EQNTOTT Vectors of 16 bit Numbers bernecky@eecg.toronto.edu (1995-11-20)
Re: EQNTOTT Vectors of 16 bit Numbers bernecky@eecg.toronto.edu (1995-11-21)
| List of all articles for this month |

Newsgroups: comp.sys.intel,comp.benchmarks,comp.compilers
From: glew@ichips.intel.com (Andy Glew)
Keywords: architecture, optimize, 586
Organization: Intel Corp., Hillsboro, Oregon
References: <478ja4$6fu@nnrp3.news.primenet.com> <47j9hm$tdp@caesar.ultra.net>
Date: Thu, 9 Nov 1995 08:01:47 GMT

        >Intel has hacked their SPEC compilers to give a 23% performance boost
        >to their numbers... hence the "magic" 23% performance increase in the
        >P5 product line vs. the data they posted several months ago.
        >The rest of the industry will do the same in the upcoming weeks. So
        >you'll see Alpha, MIPS, SPARC, and all the rest suddenly increase
        >their SPECint performance by 23%.
        >Think you'll get a "magic" 23% performance increase from MS Word or
        >any other application? Wanna buy some cheap Florida swamp land at a
        >fantastic low price?

By this, Jeff, you are probably referring to the EQNTOTT hack:
vectorizing 16 bit shorts, so that you can do them 32 bits at a
time. As described by a poster from DEC:

        >From: neideck@nestvx.enet.dec.com (Burkhard Neidecker-Lutz)

        >The optimization in question is byte/short vectorization of loops
        >involving bytes or shorts. The SPECint92 suite contains a number
        >of programs where this is applicable, the one that exploded in performance
        >when it was applied by Intel was EQNTOTT (which alone is about 18% of
        >the SPECint92 increase they got).

        >Byte/short vectorization works by coalescing multiple byte/short operations
        >into longer word/double word optimizations. EQNTOTT in particular spends
        >a lot of time comparing two arrays of shorts. This optimization was
        >probably pioneered for the SPEC suite by Digital, as we got hurt badly
        >due to Alphas earlier lack of partial-word access (i.e. there is
        >a bigger payback on the Alpha than on other machines).

        >If you don't believe that your "typical" application codes benefit from
        >partial-word vectorization, you should ignore SPECint92 results and
        >use solely SPECint95 results (which is a good idea anyway). EQNTOTT
        >was removed from the SPECint95 suite partially for this reason.

        > Burkhard Neidecker-Lutz

Since I think I was the first Intelloid to code this up (Dave
Papworth, my boss, suggested it; Wayne Scott did subsequent tweaking;
and the compiler folk did even better) mind if I respond?

(1) DEC Alpha did it first, in released SPEC numbers (as we learned
subsequently), because, on Alpha, 16 bit operations were really bad.
Moreover, Alpha gets even more benefit out of it than P6 does,
because of their 64 bit registers. (Hmm.... maybe we should be using
FIST/FILD to do this 64 bits at a time?)

(2) Yes, this is a special case optimization - but it's a special
case that occurs in a lot of code. As our compilers gets better, it
will benefit many other programs, that operate on vectors of 8 or 16
bit data. (Or, on a 64 bit processor, on vectors of 32 bit data.)

        Maybe I shouldn't point it out to the competition, but this is
exactly the sort of optimization that a compiler would use for SUN
Ultrasparc's VIS instruction set extensions.

Jeff Maggard asks if this will help MS Word? Well, I'm afraid that I
don't have the source code to MS Word... It probably won't help, not
right away. But, since MS Word *does* do lots of operations on 8 and
16 bit data, it might very well help in the future. (Of course, it's
possible that MS Word already does this sort of optimization, in the
assembly language kernel.)

        >Don't like it? Blame Intel.
        >- jeff

You don't like compilers getting better?

You don't like the optimizations that drove matmul300 out of SPEC?
Blame KAI.

Myself, I *want* compilers to get as good as possible; I *want*
compilers to be able to do all of the dirty tricks that assembly
language programmers are used to doing, so that said assembly
language programmers can spend time working on useful things rather
than tuning code.

You might as well ask me if I feel guilty about promoting the EQNTOTT
byte/word coalescing operation into Intel's compilers.

A: Not at all. It's a general purpose optimization, applicable to any
program that manipulates vectors of 8 and 16 bit numbers. (Don't know
about you, but I've written a lot of such programs.)

To be perfectly honest, I feel that a lot of the optimizations that
our compiler did to EQNTOTT *before* the vector-of-16-bit optimization
- rearrangements of IF statements for which there was no a priori
basis without profiling feedback - were a lot dirtier than the present
EQNTOTT optimization.
        It's just our luck that we stumbled on a general optimization that
gave a dramatic performance increase on a single benchmark.


Andy "Krazy" Glew, glew@ichips.intel.com, Intel,
M/S JF1-19, 5200 NE Elam Young Pkwy, Hillsboro, Oregon 97124-6497.
Place URGENT in email subject line for mail filter prioritization.

Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.