|CISC to RISC translator? Mikael.Larsson@lu.erisoft.se (1994-08-16)|
|Re: CISC to RISC translator? roedy@BIX.com (1994-08-18)|
|Re: CISC to RISC translator? firstname.lastname@example.org (1994-08-18)|
|Re: CISC to RISC translator? email@example.com (1994-08-18)|
|Re: CISC to RISC translator? firstname.lastname@example.org (1994-08-19)|
|Re: CISC to RISC translator? email@example.com (1994-08-24)|
|From:||firstname.lastname@example.org (David Keppel)|
|Organization:||Computer Science & Engineering, U. of Washington, Seattle|
|Date:||Thu, 18 Aug 1994 17:47:41 GMT|
(Mikael Larsson) writes:
>[It seems CISC code translated to RISC code should be more efficient
> than emulated code, though worse than native code.]
Many emulators are in fact doing ``translation'', which is just a
fancy name for compiling from an input language that's a CISC
instruction set to an output language that's a RISC instruction set.
My favorite survey article (but then I co-wrote it so I'm biased) is
the related work section in:
%A Robert F. Cmelik
%A David Keppel
%T Shade: A Fast Instruction-Set Simulator for Execution Profiling
%J Proceedings of the 1994 ACM SIGMETRICS Conference on the
Measurement and Modeling of Computer Systems
%D May 1994
%W Available via anonymous ftp from `ftp.cs.washington.edu'
(188.8.131.52) in `pub/pardo/shade.ps.Z' or via WWW from
Particularly relevant papers here are [Deutsch & Schiffman 1984], [May
1987], [Andrews & Sand 1992] and for the VAX -> AXP translator, [Sites,
Chernoff, Kerk, Marks and Robison 1993]. I don't have any truly good
references on SoftPC, sorry.
Before you write off CISC hardware and CISC-boosted RISC hardware as
unnecessary, realize that seeking better and better performance out of a
pure software emulator is harder and harder as you go for better and
better performance. Moreover, the greater the discrepancy between target
and host, the harder it gets.
So, for example, it's fairly straightforward to build an emulator with
performance on the order of ten to twenty instructions per simulated
instruction, much harder to do two instructions per simulated instruction
and damn hard to do two where there's a big discrepancy between the target
and host (e.g. word representations, floating-point formats,
user-controlled segmentation, ...).
One way to look at the problem is that lower-performance simulators can
emulate each instruction in isolation, but higher-performance simulators
don't have the time. Thus, they must ``clump'' instructions so that
during the middle of the ``clump'' the host machine state may bear little
resemblance to the simulated target, but that by the end of the ``clump''
the two states have converged.
Another way to look at the problem is that what you really wanted to do
was compile the original source code to native code on the host. You
don't have the source, so you _decompile_ the target code to some
high(er)-level language, then compile to the host machine. See, for
example, Jim Reuter's `decomp', available from the above ftp site in
`pub/decomp.tar.Z' and `pub/pardo/decomp-samples.tar.Z'.
Both of these techniques try to skip as much of the target-dependent work
as possible, while still producing the correct result. For example,
target machine condition code operations exist soley as an artifact of
compiling the original program. If the translator can grok a big enough
code fragment at once, it can determine that the condition code operation
is being used to implement a loop, and then go ahead an implement the loop
in the most efficient way on the host, without any simulation of the
target's condition codes.
The problem with both approaches is ensuring that you capture all the
tricky semantics that *might* be expressed in the target code. For
example, in the presence of aliasing you don't know whether a store
through one pointer will clobber a value read via another pointer. If you
don't resolve this somehow, you are effectively limited to the same
load/store sequences and register allocation used for the original target
code. If you're emulating an 80x86, this means intensive memory use and
that you are only allowed to use 8 of your 32 RISC host registers. (I'm
overstating the problem, but making sure you've captured all the possible
situations is hard.)
Better symbolic information translates directly into better performance,
but, unfortunately, such information is rarely available for older
programs. However, those are exactly the ones that are unlikely to appear
soon as host code for your machine and thus the ones you're most likely to
want to emulate.
Unsafe assumptions also translate directly into better performance. For
example, the emulator may be unable to prove that condition codes are
never valid across indirect jumps, but where that's true, the emulation
may be much faster. I refer to them as ``unsafe assumptions'' because
they're true for most parts of most programs but if applied blindly will
cause occasional failures.
This just covers instruction set simulation. As John points out,
emulating devices can get tricky, too.
To summarize: instruction set simulation gets harder as the performance
demands go up. Each time you improve a simulator to halve the performance
hit, you generally more than double the sophistication of the simulator.
In principal, emulated code can run at close to the speed of native code,
but in practice it's hard to judge the side-effects of some instructions
and thus hard to produce a very fast emulation that is guaranteed to be
;-D on ( Have you used a CISC simu lately? ) Pardo
Return to the
Search the comp.compilers archives again.