|Disassembly firstname.lastname@example.org (1990-09-09)|
|Disassembly email@example.com (1990-09-12)|
|Re: Disassembly firstname.lastname@example.org (1990-09-14)|
|Re: Disassembly Chuck.Phillips@FtCollins.NCR.COM (1990-09-14)|
|Disassembly tmsoft!mason@uunet.UU.NET (1990-09-15)|
|Re: Disassembly albaugh@dms.UUCP (1990-09-17)|
|Re: Disassembly email@example.com (1990-09-19)|
|Re: Disassembly chris@cs.UMD.EDU (1990-09-20)|
|From:||firstname.lastname@example.org (Andy Glew)|
|Date:||Wed, 19 Sep 90 00:48:03 CDT|
Many people have mentioned following branches, etc., to guide
Static is obvious. You can also do it dynamically, using the
techniques used in generating profiling feedback for a compiler. More
branches can be followed - eg. out of a jump table. Moreover, several
simplifying assumptions may be useful:
(1) code is never executed "out of phase" - ie. if a code sequence
begins with the 4 byte instruction at address A, there is no code
sequence beginning at address A+1.
(2) Code and data may be emulsified, but they aren't miscible -
ie. addresses that are executed are not data; similarly, addresses
that are fetched as data are not code (may be some boundary effects
Hackers may break these assumptions, but if all you are trying to
do is run binaries from machine A on machine B, they may be enough for
Conceptually, given a mixed code/data address space A, you can create
multiple code data spaces c1(A), c2(A), etc., for every possible
placement of address boundaries. Or, rather, you can start of your
disassembly in the following manner:
code_0x0000f561: /* 4 byte instruction */ ADD ...
code_0x0000f562: /* 2 byte instruction */ MOV ...
code_0x0000f563: /* 2 byte instruction */ ...
code_0x0000f564: /* 1 byte instruction */ ...
code_0x0000f565: /* lots of possible paths converge here */
Static branch following eliminates some code and data entries; dynamic
profiling eliminates a few more. Not all of the ambiguities may be
resolved, but the amount of replication will quickly fall to tolerable
The same approach might be used for data representation - eg. have
separate spaces for data addressed as bytes, words, etc - except that
data is much more frequently accessed by different packet sizes. It's
almost easier to have your disassembly/reassembly support library
convert "load word at data_0x0000f562" into the required sequence of
loads and shifts (to handle byte ordering) than it is to attempt to
order the data "naturally" for the target machine. This produces a
run-time penalty for the translated binary, but hopefully the new
machine is that much faster anyway, and all you are trying to do is
gain access to the wonderful world of IBM PC/VAX/IBM 360 software?
(Or, for startup UNIX companies, maybe you're just trying to get
MIPS/SUN/Ultrix binaries running on your new hardware).
In any case, the hardest thing about binary translation is handling
the stuff that isn't being disassembled - namely OS calls.
Return to the
Search the comp.compilers archives again.