Re: Taking an AST back into C

torbenm@diku.dk (=?iso-8859-1?q?Torben_=C6gidius_Mogensen?=)
1 Dec 2004 23:04:04 -0500

          From comp.compilers

Related articles
Taking an AST back into C nb_no_spam@synthcom.com (Neil Bradley) (2004-11-28)
Re: Taking an AST back into C Martin.Ward@durham.ac.uk (Martin Ward) (2004-12-01)
Re: Taking an AST back into C torbenm@diku.dk (2004-12-01)
Re: Taking an AST back into C vbdis@aol.com (2004-12-01)
Re: Taking an AST back into C vbdis@aol.com (2004-12-05)
Re: Taking an AST back into C vbdis@aol.com (2004-12-05)
Re: Taking an AST back into C vbdis@aol.com (2004-12-11)
Re: Taking an AST back into C Martin.Ward@durham.ac.uk (Martin Ward) (2004-12-11)
Re: Taking an AST back into C vbdis@aol.com (2004-12-13)
| List of all articles for this month |

From: torbenm@diku.dk (=?iso-8859-1?q?Torben_=C6gidius_Mogensen?=)
Newsgroups: comp.compilers
Date: 1 Dec 2004 23:04:04 -0500
Organization: Department of Computer Science, University of Copenhagen
References: 04-11-119
Keywords: decompile
Posted-Date: 01 Dec 2004 23:04:04 EST

"Neil Bradley" <nb_no_spam@synthcom.com> writes:


> I'm currently working on a code recompiler that translates binary code
> back into its C equivalent. Fortunately, just about every processor
> instruction can be represented as an expression (computing carry,
> overflow for example).
>
> I've created an abstract syntax tree for the binary code and am
> translating it back into C, and now I'm trying to figure out when to
> emit parenthesis in my expressions (so that operator precedence will
> be honored).
>
> Does anyone know of a rule or approach that would let me know when
> parentheses should be emitted? As near as I can tell, it's when the
> node on the right tree of the current operator has an operator that is
> of differing precedence, but that just doesn't seem complete enough.


Placing parentheses is the least of your problems when decompiling.
There are standard methods for doing that: If you have an abstract
syntax tree of arithmetic operations (or, equivalently, a stack
program), you first compute the infix string representation of the two
subexpressions (subtrees / substacks) and the priority of the
top-level operator of each (including whether it is left- right- or
non-associative). You then compare these precedences with the
operator of the whole expression.


If the priority of the topmost operator is greater (i.e., binds
tighter) than that of a subexpression, you add parentheses around this
subexpression. If the priority of the topmost operator is lower than
that of a subexpression, you don't add parentheses around this. If
the priority of the topmost operator is the same as that of a of the
subexpression, you look at the associativity. If non-associative, you
add parentheses around the subexpression. If left-associative, you
add parentheses around the subexpression if it is the right operand of
the topmost operator but not if it is the left operand. If
right-assiciative, you add parenthesis around the subexpression if it
is the left operand of the topmost operator but not if it is the right
operand.


Note that this implies that operators with identical priority have
identical associativity. This is the usual case, but if not you
shoudl add parentheses around the subexpression regardless if it to
the left or right.


> [There's been a lot of work on decompilers over the years, including
> one I tried that disassembled x86 object code and turned it into C.
> It worked, but the results were so low-level that they were useless.
> -John]


If the code doesn't use any data structures, it is usually not so bad,
as operators translate fairly directly back into high-level operators.
Your main problem will be to translate register and stack references
back into variable references, but if you know the layout of
activation records, that's doable too. Control structure isn't so
bad, as loops and if-then-else constructs are fairly easy to detect.
The main problem is data structures, as different data structures are
translated into similar code. Alan Mycroft had a paper called
"type-based decompilation" at ESOP'99 (Springer LNCS 1576) that
attempted to do this, but admitted that the method doesn't work for
all programs.


                Torben Mogensen





Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.