Re: Writing a disassembler ?

Jeff Kenton <jeffrey.kenton@comcast.net>
Sat, 11 Oct 2008 09:51:43 -0400

          From comp.compilers

Related articles
Writing a disassembler ? lightfault@gmail.com (So and so) (2008-10-10)
Re: Writing a disassembler ? j.vimal@gmail.com (Vimal) (2008-10-11)
Re: Writing a disassembler ? jeffrey.kenton@comcast.net (Jeff Kenton) (2008-10-11)
Re: Writing a disassembler ? DrDiettrich1@aol.com (Hans-Peter Diettrich) (2008-10-11)
Re: Writing a disassembler ? sh006d3592@blueyonder.co.uk (Stephen Horne) (2008-10-11)
Re: Writing a disassembler ? ArarghMail810@Arargh.com (2008-10-11)
Re: Writing a disassembler ? gah@ugcs.caltech.edu (glen herrmannsfeldt) (2008-10-12)
Re: Writing a disassembler ? lightfault@gmail.com (So and so) (2008-10-16)
Re: Writing a disassembler ? bc@freeuk.com (Bartc) (2008-10-16)
[1 later articles]
| List of all articles for this month |

From: Jeff Kenton <jeffrey.kenton@comcast.net>
Newsgroups: comp.compilers
Date: Sat, 11 Oct 2008 09:51:43 -0400
Organization: Compilers Central
References: 08-10-011
Keywords: disassemble
Posted-Date: 12 Oct 2008 08:16:53 EDT

I wrote a disassembler for a Motorola 88000 once. It's a RISC
architecture, so it only took me 4 hours. Writing one for an x86
architecture will take you considerably longer.


Read the whole file into an array at once -- memory is cheap.


For my disassembler, I had a table consisting of a mask, an opcode
value, instruction class and a string for the opcode name. For each
instruction I would mask it and compare the value. When I found a
match, I printed the opcode and used the instruction class (and a BIG
switch statement) to decide how to print the operands. Its purpose was
just disassembly to text -- no fancy interface nor understanding of the
code nor attempt to track variable names nor jump targets. Its speed
was only limited by how fast you could printf to the screen.


For Intel x86 disassembly, you will need to handle multibyte opcodes
(mine had fixed size 32 bit instructions). You could probably still do
it with a single table using appropriate multibyte masks and values, but
you might also choose to break it down by instruction length. In that
case, I still suggest using the same scheme -- first prefixes, then
single byte instructions, etc. The instruction class field can be your
guide to whether you need another byte and what to do with it.


For invalid instructions you have the problem of getting out of sync
with the intended instruction stream. It will eventually sync back up,
or else you can try something fancy to figure out what's going on.


The way you represent the disassembled instruction depends on what
you're going to use it for. When you decide that you'll also know what
to do with your invalid instructions.


Don't multi-thread. This a disassembler, not a computer science project.


Good luck, and have fun.


jeff




So and so wrote:
> 1. Which data structure should store the values I read ? A hash table
> or a Tree ? Or a combination of both ? (trie) Should the tree be
> balanced ? If not, will it cost in efficiency or whether balancing it
> will cost in efficiency ?
>
> 2. What about invalid instructions ? Should I strip them the moment I
> detect they're invalid or should they be stored FFU ?
>
> 3. Which data structure should hold the final result of the
> disassembled instruction ?
>
> 5. Should the disassembler itself be multi threaded or one program
> which does everything step-by-step and if it will be multi threaded -
> how can I handle or parse different instructions ? or handle
> synchronization ?




---------------------------------------------------------------------
= Jeff Kenton http://home.comcast.net/~jeffrey.kenton =
---------------------------------------------------------------------



Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.