Re: convert x86 assembly to c ?
1 Oct 2000 00:26:05 -0400

          From comp.compilers

Related articles
convert x86 assembly to c ? (Lynn McGuire) (2000-09-24)
Re: convert x86 assembly to c ? (Trent Waddington) (2000-09-25)
Re: convert x86 assembly to c ? (2000-09-25)
Re: convert x86 assembly to c ? (david lindauer) (2000-09-25)
Re: convert x86 assembly to c ? (Lynn McGuire) (2000-09-25)
Re: convert x86 assembly to c ? (Lynn McGuire) (2000-09-28)
Re: convert x86 assembly to c ? (2000-09-28)
Re: convert x86 assembly to c ? (2000-10-01)
| List of all articles for this month |

Newsgroups: comp.compilers
Date: 1 Oct 2000 00:26:05 -0400
Organization: - Before you buy.
References: 00-09-172
Keywords: decompile

    "Lynn McGuire" <> wrote:
>Does anyone know of a good tool to convert x86 assembly to C code ?
>Lynn McGuire
>[This question has come up over the years many times, and the answer is
>that it's a really hard problem. Austin Code Works did an x86 to C
>translator ten or 15 years ago which worked, but wasn't very useful
>because the C code was just a a transliteration of the x86 code with
>variable names like ax and ebp. I've seen some work on recovering

I've done some work on this, and other source translation, and the
answer is a flexible translation system that can easily LEARN or at
least be taught the idioms of the particular source language(and
programmer). Starting with a competent translation, recognizing the
areas of poor understanding, then attacking them with a tool that can
specialize on each particular pattern and produce better output. Of
course, it must be capable of simple arithmetic expression
decompilation, and recognize logical expressions usually expressed as
comparison then conditional SKIPs or JMPs.

An example of superfluous code is the code pushes a lot of registers
to call a routine and then pops them back, any programmer would
understand the registers which are being protected from the routine
and which are arguments, the translator should find the routine in the
Database of procedures, suppress the protection push/pops and use the
arguments in the function call and result type/location. Does it
return a condition code or pointer or "int" in which register? Is the
condition code normally ==/>/< or .overflo. and what is TRUE in that

Successive runs of the translator will remove the superfluous code,
and the result will be more maintainable, along with the comments
which must be preserved. Some comments are not worth saving, and for
one customer, we taught the translator to suppress stylized comments
about the binary point for fixed point binary arithmetic, since one of
the results of the translation was to go from assembly on a 16bit mini
to 64 bit floating point. That translator also recognized that shifts
could be multiply/divide by powers of 2 when dealing with arithemtic
items, and address resolution adjustments(which could be ignored) when
dealing with pointers. I like to think of these successive runs as
hacking at the weeds around the object to see if the land is level
flat or rolling or rutted or even SWAMPY.

The critical elements in the translator are fast turnaround, and ease
of extension to the database and logic, recognizing larger patterns
will produce more maintainable code.

In the case of Lynn's code, the real problem is the macros, expanding
and then trying to decompile the instructions to C will never work as
well as recognizing the macros themselves as instructions and
constructing productions based on their intended functionality.

>code from Vaxes, again a while ago. On the other hand, there's a lot
>of work going on in binary translation, turning one kind of object
>code to another. Look at the comp.compilers archives for messages
>and conference announcements. -John]

Bob Sheff, Independent Consultant
available for work on source translation projects
and other interesting coding.
bsheff2 AT yahoo D O T C O M

Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.