Re: Decompilers

markh@csd4.csd.uwm.edu (Mark)
Sat, 11 Sep 1993 23:49:42 GMT

          From comp.compilers

Related articles
Decompilers dkoski@cs.arizona.edu (David Koski) (1993-08-30)
Re: Decompilers markh@csd4.csd.uwm.edu (1993-09-11)
| List of all articles for this month |
Newsgroups: comp.compilers
From: markh@csd4.csd.uwm.edu (Mark)
Keywords: disassemble
Organization: Computing Services Division, University of Wisconsin - Milwaukee
References: 93-09-021
Date: Sat, 11 Sep 1993 23:49:42 GMT

David Koski <dkoski@cs.arizona.edu> writes:
>I am looking for any references on decompiling. This is for a possible
>school project (depending on the data in said references).


The disassemblers I wrote for the 8051 and 8085 will extract all the code
segments that can be reached from the entry points given to it, except for
those reachible only through indirect jumps or stack manipulations. A
reference count of each entry address and each jump/call destination is
also kept. They also generate labels from symbol table input.


There is enough information created during disassembly to extract out the
control flow graph, which provides the skeleton on which you hang your
data flow analysis. In turn, the data flow analysis will allow you to
resolve the indirect jumps and stack-generated jumps, which results in a
further refinement of the data flow analysis. All you're doing here is
making a back end to an assembly-to-high level language compiler, using
more or less common techniiques to compiler writing.


Copies of the disassemblers can be found at the FTP site csd4.csd.uwm.edu
in /pub/compilers/8080/*, and /pub/8051/assem/das.*, if you want to do
anything with them.


Note that the Halting Problem has no bearing at all on the ability to
resolve indirect jumps, since the processor you're working with will
ALWAYS have a finite address space and a finite number space (the Halting
Problem requires the machine to have an infinite address space with
arithmetic done over the entire set of integers). In practice, you
actually do have to make essential use of the finiteness of the address
space in resolving indirect jumps, in many places, too. So it's not an
insignificant observation. Because of the finiteness of the number space,
reasoning over the machine's arithmetic is likewise computeable and
usually tractible in practice.


I decompile assembly code by hand all the time as a matter of course and
may in the future formalize some of the methods and upgrade the
disassemblers to decompilers.
--


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.