Re: compiler and metadata, request opinions...

"cr88192" <>
Sat, 25 Apr 2009 13:15:03 -0700

          From comp.compilers

Related articles
compiler and metadata, request opinions... (cr88192) (2009-04-22)
Re: compiler and metadata, request opinions... (Hans-Peter Diettrich) (2009-04-24)
Re: compiler and metadata, request opinions... (cr88192) (2009-04-25)
Re: compiler and metadata, request opinions... (Hans-Peter Diettrich) (2009-04-27)
Re: compiler and metadata, request opinions... (cr88192) (2009-04-28)
| List of all articles for this month |

From: "cr88192" <>
Newsgroups: comp.compilers
Date: Sat, 25 Apr 2009 13:15:03 -0700
References: 09-04-051 09-04-059
Keywords: design
Posted-Date: 27 Apr 2009 05:53:53 EDT

"Hans-Peter Diettrich" <> wrote in message
> cr88192 schrieb:
>> I am actually using the same parser for all 3 languages (and C++ as well,
>> but this is a lower priority), where an internal "lang" variable is used
>> to
>> remember which language is being processed at the time (and adapt
>> behavior
>> as appropriate).
> I'd use different frontends (parser...) for each language, each building
> an canonical AST. Where "canonical" means that the AST is understood by
> all following stages.

writing 3 parsers would mean maintaining 3 parsers, which is unecessary
since most of the syntax is common between the languages...

> Problematic are e.g. classes, which have different implementations and
> behaviour in C++, Java, .NET etc., so that the generated code for
> dealing with an object has to take into account the language's class
> model and lifetime rules.

I have mostly been developing a "common superset" approach.

there are actually several different types of classes and structs:
struct/union: good old C struct/union;
struct/union(1): shared between C++ classes/structs, and C# structs
(currently N/A in Java);
class: C#/Java class, '__gc class' (or '__class') in C++ (C++ defaults to
'__nogc class');
interface: C#/Java interface, exported as '__interface' in C++.

1: these use the same tags at present, but are structurally different (using
different tags may be a good idea here, but at present they are recognized
by the structural difference in the ASTs).

there are different flags and flag semantics, which have not as of yet been

this area is the point of greatest divergence in the current parsing and
processing logic...
another area is in the handling of namespaces (not fully resolved thus far).

the compiler will presently allow things to be done which are technically
not allowed in the respective languages:
using namespaces as an import mechanism in C++ (though, unless supported
explicitly, this would not allow importing types);
declaration of top-level and namespace-scoped variables and funtions in C#;
Java and C# both include a textual preprocessor;

>> I have determined that, due to semantic and architectural issues, I can't
>> embed the metadata directly into the object modules (the reason being
>> that
>> COFF and ELF modules are loaded as needed, but for technical reasons all
>> of
>> the metadata needs to be available to the runtime prior to the linking
>> process).
> What exactly is "metadata"?

information which describes things like:
all of the namespaces, classes (and class layouts), interfaces, functions
and signatures, ...

all of this stuff needs to be available for the runtime and compilers to
work properly (in part due to C# and Java not using the "include
teh-crapload of text" approach taken by C and C++...).

it is the same sort of thing which .NET drags along with its assemblies.
in Java (in the "proper"/JVM sense), this info is usually stored in the
class files along with the bytecode.

originally, I had wanted to store all of this in the object files, and so
when linked all this info would be conviniently embedded in the image along
with all the other code and data.

but, as a consequence of certain things being done at link time, and linking
being incremental in my framework, this approach could not be used (the
metadata would then need to be in a form which can be accessed apart from
having to link the image).

note that unlike in a more traditional C++ compile/link process, a lot of
info (such as the physical in-memory layout of objects) is not directly
handled by the compiler, but is instead left to dynamic link-time (OTOH, C
structs/unions are fixed at compile time).

>> further context:
>> portions of the runtime may register themselves with the linker, where a
>> request for a particular piece of information is embedded in a symbol
>> (sort
>> of like in HTTP CGI requests), and so when a module is linked, the
>> runtime
>> may recieve the request and generate any code or data necessary to
>> fullfill
>> this request.
> That's a consequent extension of the selectable frontend (language...).

I think something like this is likely needed to be able to compile a Java or
C#-like language to native-code object files (either that, or creating a
custom object format which behaves similarly to Java class files, rather
than acting like good old COFF or ELF...).

actually, I could embedd a lot of this kind of data in COFF or ELF files via
the use of special purpose sections, but this would require a little work
(and further creation of special linking tools, as almost invariably linking
it with something like GNU-LD would mess everything up...). as is, partial
linking via LD would be allowed (although there is not much reason to do
so...), but at the cost that if the tables are misplaced, it may not be
possible to properly link or load the code...

> The FreePascal compiler has an interesting model for dealing with
> different target widgetsets, machines and systems. cpp will have a
> similar model (dunno). You can declare abstract classes or interfaces
> for your AST nodes and targets, and instantiate the appropriate
> object(s) when the language, library type etc. is known.

actually, by the time most of the metadata much comes into question, the
compiler is out of the process (the compiler runs, and spews out object code
and tables).

the linker and runtime use this information, but are physically disjoint
from the compiler.
(the compiler may also use some of this info from libraries, but mostly to
answer really basic questions like "is 'Foo' a class?", "what is the type of
Bar.z?", ...).

this issue, however, does make implementing templates/generics look a little
scary (since it is not entirely clear how to instantiate a generic without
having to call back into the compiler, which I regard as ugly...).

but, at least on the upside:
by the time the machinery will be in place for instantiating generics, the
machinery would also be in-place for handling expression-level eval (at
present, 'eval' can only be done at the module or function level...).

Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.