linker magic: dynamic meta handlers...

"cr88192" <cr88192@hotmail.com>
Mon, 23 Mar 2009 12:22:48 +1000

          From comp.compilers

Related articles
linker magic: dynamic meta handlers... cr88192@hotmail.com (cr88192) (2009-03-23)
Re: linker magic: dynamic meta handlers... DrDiettrich1@aol.com (Hans-Peter Diettrich) (2009-03-23)
Re: linker magic: dynamic meta handlers... cr88192@hotmail.com (cr88192) (2009-03-24)
| List of all articles for this month |

From: "cr88192" <cr88192@hotmail.com>
Newsgroups: comp.compilers
Date: Mon, 23 Mar 2009 12:22:48 +1000
Organization: albasani.net
Keywords: linker, question
Posted-Date: 23 Mar 2009 06:22:03 EDT

well, mostly, this is text from some of my specs, in this case mostly
dealing with my compiler doing funky codegen magic at link time.


I am wondering here if anyone has comments, or thinks I am approaching all
this in totally the wrong way, ...




the newest idea here is to add a feature to allow registering callbacks with
my linker (which links code at runtime) to allow the runtime to perform
automatic code generation, or to deal with metadata embedded into assembly
code or object files (in my case, I may use ELF or COFF externally, although
granted all this won't work with a conventional linker...).


as noted at the bottom, about the first time I did this was for glossing
over the calling convention issue on x86-64.


a later use had been for TLS, which had basically used '_XT_' as the prefix
and rode on top of some of the machinery for 'XCall'. the case of TLS had
introduced the use of separate arguments, which in this case were used to
encode the input/output register and the amount of space to reserve for the
variable. (note: the underscore here is not the default prefix from COFF or
similar, rather the prefix is '__XT_' inside COFF files due to the
additional underscore).


in other places, I had not done any linker magic, but instead generated
fixed-form calls to operation thunks (typically passing input/output on the
stack and designed to make up for apparent holes in the x86 instruction
set). in some other cases I could generate big ugly inline globs of code, or
hard code calls back into the C-based runtime (technically, when done in the
backend/codegen machinery this is rather ugly, and may require hackery like
manually saving and restoring any in-use registers to avoid fouling up the
register allocator/codegen, ...).


thus, the reason for adding a feature for generalized meta-functions, is to
allow a more general and less awkward way to deal with some of these cases
(the call name can include info about things like input and output
registers, ...), thus allowing a more specialized piece of code to be
generated (and also allowing me to offload some of the complexity from the
compiler back into the runtime in some other cases).


similarly, one can also use this to "pretend" like x86 has features that it
doesn't, namely one can emit calls to magic functions, which may then act as
if some particular instruction existed, ...




<--
Idea:
This will specify how runtime code could register itself with the linker,
such that attempts to resolve certain undefined symbols may be routed back
into the runtime, which is expected to produce code for the requested
function (as in, special handler thunks), or may also be used in supplying
data.


If the handler builds a code fragment which exports the given symbol name,
all further attempts to resolve this symbol will use the symbol exported in
the generated code fragment. The other option will be simply to return a
pointer to an anonymous thunk, allowing each request to be resolved
potentially to a different address.


The request may also be passed arguments, which will describe the specifics
of the requested thunk or data (such as specific types, class names,
register names, numerical args, ...).


The calling convention of any generated thunks is purely a matter of
agreement between the caller and callee. The assumed use here is to generate
specialized code fragments, as opposed to more general purpose functions, so
typically any combination of stack, registers, shared variables/memory, ...
may be used.


It will be assumed (as a mater of practice) that the same runtime code be
responsible both for generating the request names, as well as for generating
the thunks.


An example use case would be for some of the runtime code to register itself
with the compiler, and when generating code for particular cases may insert
calls to meta-handlers, which may be capable of tasks which would not be
reasonable to do inline.




Symbol Structure:
_XM_<handler> ['__' <arg>]*


Where handler and each arg are strings mangled according to similar rules as
in XCall.


Each string will be mangled by replacing certain characters with escape
sequences:
'_' with '_1';
';' with '_2';
'[' with '_3';
'(' with '_4';
')' with '_5'.


Alphanumeric characters are embedded unchanged.
'_9xx' encodes a character in the range of 1 to 255;
'_0xxxx' encodes a character in the 16-bit unicode space (BMP).




Callback:
typedef void *(*basm_meta_ft)(char *sym, char *name, char **args);
int BASM_RegisterLinkMeta(char *name, basm_meta_ft fcn);


Register a meta-handler.
The 'sym' argument is the raw symbol, whereas 'name' and 'args' are the
parsed and unmangled names and arguments (passed as a NULL-terminated list).




Meta Triggers


A meta trigger is similar to, but different from, a meta handler.
A meta trigger will be called after a piece of code is linked, and will
identify the address of any trigger symbols.


This could be used for passing info from newly linked code into the runtime.


Similarly, each meta-triggered symbol is to have a unique name, even if this
means that an extra argument is provided simply to serve as a gensym.


Additionally, trigger requests may be queued until an appropriate handler is
registered, but each symbol will only be handled once (unless it is later
re-linked).




The symbol structure for triggers is:
_XN_<handler> ['__' <arg>]*


And uses the callback:
typedef void (*basm_mtrg_ft)(char *sym, char *name, char **args, void *ptr);
int BASM_RegisterLinkMetaTrigger(char *name, basm_mtrg_ft fcn);
-->




This is older, but shows around the first time I started using automatic
code generation in the linker...


<--
Idea:


XCall is a calling convention for x86-64, and may be used in place of the
SysV or Win64 conventions. Its purpose is to simplify code generation in
some cases (such as when writing a compiler).


(Removed big chunk about argument passing, register usage, and
prologue/epilogue rules...
In short it is x86 cdecl modified for x86-64, with 16-byte alignment and the
addition of prologues and epilogues similar to those in Win64.)


Function Naming:


XCall will make use of name mangling for all names. This will simplify the
autogeneration of stubs when linking against existing code which may be
compiled to use a different calling convention (the mangled name will serve
to tell the stub generator how the stack frame is layed out in order so that
it can re-package the arguments as needed).


The function name will include a prefix:
'_XC_' is used for ordinary functions and for calls to a function which
accepts a fixed number of arguments;
'_XV_' is used when calling vararg functions, and also for the call-target
of such a call (this symbol will be either an alias or a jump to the actual
function implementing the vararg function, or a conversion stub if an
inter-convention call is taking place, however this will never be the proper
name of the vararg function in question).


This prefix will be followed by a mangled version of the name and signature
string.


Note: Include section on Signature Strings if needed.


(Left out string rules, given they were given before.)


Note that it is the case with ordinary function calls that the mangled name
used by the caller and reciever are required to be equivalent.


The signature in this case will represent the arguments being accepted by
the reciever, and not the types of values on the stack from the POV of the
caller (It is reasonable that the types not match exactly between them, for
example, as the result of a cast or implicit type conversion).


In the case of Vararg functions, the signature will represent the values
passed on the stack from the POV of the caller, and so the same target
function may be called by any number of possible names (It is then the
responsibility of the linker to locate the correct call-target for a given
name and signature).
-->


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.