Re: win32/linux executables

"Carsten Kuckuk" <ck@kuckuk.com>
5 Jan 2002 01:45:38 -0500

          From comp.compilers

Related articles
win32/linux executables alankarmisra@hotmail.com (2002-01-03)
Re: win32/linux executables fab@rhapsodyk.net (Fabrice Medio) (2002-01-05)
Re: win32/linux executables ck@kuckuk.com (Carsten Kuckuk) (2002-01-05)
Re: win32/linux executables vbdis@aol.com (2002-01-05)
Re: win32/linux executables newsfeed@boog.co.uk (Peter Cooper) (2002-01-05)
| List of all articles for this month |

From: "Carsten Kuckuk" <ck@kuckuk.com>
Newsgroups: comp.compilers
Date: 5 Jan 2002 01:45:38 -0500
Organization: Compilers Central
References: 02-01-010
Keywords: code, linker
Posted-Date: 05 Jan 2002 01:45:38 EST

> where do i start? intel's assembly specification ? windows APIs? any
> resource that would help me write a basic compiler would help. i can
> then try and expand that further and seek help as and when i get
> stuck.


If you want to do it the proper way, follow these steps:


(1) Study the specification document for the executable file format on your
targetted operating system.[Win32: Open the MSDN Library, select
Specifications->Microsoft Portable Executable and Common Object File Format
Specification. You can't find it online anymore. Linux: ELF file
specification. I'm not so firm here. A Google search turned up
http://developer.intel.com/vtune/tis.htm but this seems to be pretty
outdated.]


(2) Study the processor manufacturer's programmer's manual for the
microprocessor that you target. [At
http://developer.intel.com/design/Pentium4/manuals/ you can download the
manuals describing the inner workings of the Pentium 4 processor for example
as PDF files.]


(3) Find out what the _binary_ calling interface specification between a
user-program and operating system functionality on your targeted operating
system is. [Win32: You have to call entry points of system DLLs like
KERNEL.DLL, GDI or USER.DLL. Linux: If you're sane, you'll treat GLIBC as
the OS API. If you're insane, you'll make kernel calls using INT80h. You can
find the actual mapping in /usr/include/asm/unistd.h]


(4) Implement an object code emiter, and a linker.


If you're an experienced assembler programmer, and an experienced C (C++,
Java, whatever your implementation language is) programmer then it will take
you at least one full week to only read and understand the documents. Then
add at least a full month in order to write startup code in assembly
language, object code generation, and final linking into an executable file.
Then add six months of debugging. There's no way around it if you want to do
all by yourself.


You can simplify the task by choosing an easier file format, an easier
operating system, and an easier CPU. The simplest target platform I can
think of is MS-DOS, and in particular the .COM file format. The file can be
up to 64KB-256B big. It contains only the bytes of the opcodes. No
relocation, no fixup, no external references, no nothing. Upon loading, a
new memory segment will be allocated, and the contents of the file will be
loaded at offset 0x0100. Execution will start there. CS=DS=SS=ES. The
command line is stored from 0x080..0x0ff. In order to terminate the program
you call offset 0. The interface to the operating system is through
interrupts. The best source of documentation freely available is Ralf
Brown's Interrupt List at http://www-2.cs.cmu.edu/~ralf/files.html. As for
the processor, you only have to learn the 8086/8088 CPU which is not so
time-consuming as the Pentium CPU. On this scale, everything stays manageble
in a hobbyists time frame and you still learn the essential parts.


If this is not an option for you, then I would suggest that you avoid the
bits and bytes level that is needed when you deal with the file format
yourself by creating symbolic assembly language as the result of your
compiler and let freely available assembler and linker programs do the hard
work for you. You still need to learn assembly language and the binary
calling interface to the operating system, but you would save a lot of work.
On Linux you can use the assembler and linker that comes with gcc to do this
dirty work. Take a look at the manual pages of gcc. Pay particular attention
to the -S option, and the .s, .S, .o file endings.


HTH,


Carsten Kuckuk


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.