Re: OoO, VLIW, Are there different programming languages that are compiled to the same intermediate language?

gah4 <gah4@u.washington.edu>
Wed, 8 Feb 2023 01:04:38 -0800 (PST)

          From comp.compilers

Related articles
Are there different programming languages that are compiled to the same intermediate language? costello@mitre.org (Roger L Costello) (2023-01-29)
Re: Are there different programming languages that are compiled to the same intermediate language? gah4@u.washington.edu (gah4) (2023-01-31)
Re: Are there different programming languages that are compiled to the same intermediate language? gah4@u.washington.edu (gah4) (2023-02-02)
Re: Are there different programming languages that are compiled to the same intermediate language? anton@mips.complang.tuwien.ac.at (2023-02-03)
Re: OoO, VLIW, Are there different programming languages that are compiled to the same intermediate language? gah4@u.washington.edu (gah4) (2023-02-03)
Re: OoO, VLIW, Are there different programming languages that are compiled to the same intermediate language? gah4@u.washington.edu (gah4) (2023-02-04)
Re: OoO, VLIW, Are there different programming languages that are compiled to the same intermediate language? anton@mips.complang.tuwien.ac.at (2023-02-07)
Re: OoO, VLIW, Are there different programming languages that are compiled to the same intermediate language? gah4@u.washington.edu (gah4) (2023-02-08)
| List of all articles for this month |
From: gah4 <gah4@u.washington.edu>
Newsgroups: comp.compilers
Date: Wed, 8 Feb 2023 01:04:38 -0800 (PST)
Organization: Compilers Central
References: <Adkz+TvWa4zLl8W9Qd6ovtClKZpZrA==> 23-01-078 23-02-001 23-02-007 23-02-011 23-02-015 23-02-027
Injection-Info: gal.iecc.com; posting-host="news.iecc.com:2001:470:1f07:1126:0:676f:7373:6970"; logging-data="7682"; mail-complaints-to="abuse@iecc.com"
Keywords: architecture
Posted-Date: 08 Feb 2023 11:34:36 EST
In-Reply-To: 23-02-027

On Tuesday, February 7, 2023 at 6:24:44 PM UTC-8, Anton Ertl wrote:


(snip)


> >All OoO processors have a limit
> >to how far they can go. But the compiler does not have that limit.


> And the compiler can make more sophisticated scheduling decisions
> based on the critical path, while the hardware scheduler picks the
> oldest ready instruction. These were the ideas that seduced Intel,
> HP, and Transmeta to invest huge amounts of money into EPIC ideas.


> But the compiler has other limits. It cannot schedule across indirect
> calls (used in object-oriented dispatch), across compilation unit
> boundaries (in particular, calls to and returns from shared
> libraries).


OK, maybe we are getting closer.
Many processors now have speculative execution.
If you don't know what else to do, execute some instructions
just in case, but don't change actual memory or registers yet.


And Itanium doesn't do that. I don't see, though, that it isn't
possible to have EPIC and also speculative execution.




> Another important limit is the predictability of
> branches. Static branch prediction using profiles has ~10%
> mispredictions, while (hardware) dynamic branch prediction has a much
> lower misprediction rate (I remember numbers like 3% (for the same
> benchmarks that have 10% mispredictions with static branch prediction)
> in papers from the last century; I expect that this has improved even
> more in the meantime. If the compiler mispredicts, it will schedule
> instructions from the wrong path, instructions that will be useless
> for execution.


OK, but we just discussed speculative execution. So you can execute
two paths or maybe three.




> In the end a compiler typically can schedule across a few dozen
> instructions, while hardware can schedule across a few hundred.
> Compiler scheduling works well for simple loops, and that's where
> IA-64 shone, but only doing loops well is not good enough for
> general-purpose software.


OK, loops were important for the 360/91, meant for floating point
number crunching. (Only floating point is OoO.)
And small inner loops are usual in those programs. Among those
is loop mode, where it keeps the instructions, and doesn't have
to keep fetching them.


But also, the 360/91 was designed to run code written for any
S/360 model. So, no special ordering and even self-modifying
code. Even more, S/360 only has four floating point registers,
so register renaming was important. Instructions had to be
ordered for use of registers, where out-of-order and register
renaming could fix that.


Okay, so I am not saying that EPIC has to get all the ordering,
but only close enough. So, ones now can keep hundreds of
instructions in execution at the same time?


> >Now, since transistors are cheap now, and one can throw a large
> >number into reorder buffers and such, one can build really deep
> >pipelines.


> It's interesting that Intel managed to produce their first OoO CPU
> (the Pentium Pro with 5.5M transistors) in 350nm, while Merced (the
> first Itanium) at 25.4M transistors was too large for the 250nm and
> they had to switch to 180nm (contributing to the delays). So, while
> the theory was that the EPIC principle would reduce the hardware
> complexity (to allow adding more functional units for increased
> performance), in Itanium practice the hardware was more complex, and
> the performance advantages did not appear.


I remember lots of stories about how PPro didn't do well with the 16 bit
code still in much of DOS and Windows. Enough that it was slower
than Pentium. I am not sure by now how much OO the PPro does.


> >But the reason for bringing this up, is that if Intel had a defined
> >intermediate code, and supplied the back end that used it,
> >and even more, could update that back end later, that would have
> >been very convenient for compiler writers.


> Something like this happened roughly at the same time with LLVM.
> There were other initiatives, but LLVM was the one that succeeded.


> There was the Open Research Compiler for IA-64 from Intel and the
> Chinese Academy of Sciences.


> SGI released their compiler targeted at IA-64 as Open64.


> >Even more, design for it could have been done in parallel with the
> >processor, making both work well together.


> Intel, HP, and others worked on compilers in parallel to the hardware
> work. It's just that the result did not perform as well for
> general-purpose code as OoO processors with conventional compilers.


> >[Multiflow found that VLIW compile-time instruction scheduling was
> >swell for code with predictable memory access patterns, much less so
> >for code with data-dependent access patterns.


> Multiflow and Cydrome built computers for numerical computations (aka
> HPC aka (mini-)supercomputing). These tend to spend a lot of time in
> simple loops with high iteration counts and have statically
> predictable branches. They found compiler techniques like trace
> scheduling (Multiflow) that work well for code with predictable
> branches, and modulo schedling (Cydrome) that work well for simple
> loops with high iteration counts. The IA-64 and Transmeta architects
> wanted to extend these successes to general-purpose computing, but it
> did not work out.


I sometimes work on computational physics problems. The ones with
tight loops of floating point operations. I believe that some Itanium
systems went into a large cluster somewhere. I got an RX2600
for $100 some years ago, when there were many on eBay.
(You could even buy three at a time.)


But yes, I suspect for web servers, that they aren't especially good.


> Concerning memory access patterns: While they are not particularly
> predictable in general-purpose code, mostd general-purpose code
> benefits quite a lot from caches (more than numerical code), so I
> don't think that this was a big problem for IA-64.


> Some people mention varying latencies due to caches as a problem, but
> the choice of a small L1 cache (16KB compared to 64KB for the earlier
> 21264 and K7 CPUs) for McKinley indicates that average latency was
> more of a problem for IA-64 than varying latencies.


> >And if the memory access is that predictable, you can
> >likely use SIMD instructions instead. -John]


> Yes, SIMD ate EPICs lunch on the numerical program side, leaving few
> programs where IA-64 outdid the mainstream.


The Cray series of vector processors seems long gone by now.


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.