Re: Generic AST in XML for any language

"BGB / cr88192" <>
Sat, 13 Mar 2010 12:37:52 -0700

          From comp.compilers

Related articles
Generic AST in XML for any language (Kalahan) (2010-03-11)
Re: Generic AST in XML for any language (Ira Baxter) (2010-03-13)
Re: Generic AST in XML for any language (BGB / cr88192) (2010-03-13)
Re: Generic AST in XML for any language (2010-03-14)
Re: Generic AST in XML for any language (Manuel Collado) (2010-03-14)
Re: Generic AST in XML for any language (Olaf Krzikalla) (2010-03-15)
Re: Generic AST in XML for any language (Nikolaos Kavvadias) (2010-03-18)
Re: Generic AST in XML for any language (Hans-Peter Diettrich) (2010-03-20)
| List of all articles for this month |

From: "BGB / cr88192" <>
Newsgroups: comp.compilers,comp.lang.c++
Date: Sat, 13 Mar 2010 12:37:52 -0700
References: 10-03-020
Keywords: analysis, XML, UNCOL
Posted-Date: 13 Mar 2010 15:05:02 EST

"Kalahan" <> wrote in message
> Does anyone knows if there is such thing as an standard to represent
> the basic elements of a language (functions, variables, classes)? And
> generated in XML?
> I know that the title might be misleading about the meaning of an AST
> but I have a project in mind and I don't want to replycate work. Also
> that might be aiming too high if we start adding functional languages,
> aspect oriented programming, etc
> Also I would appreciate if you could point me to projects where I can
> get a good XML representation of a source file.

I use XML internally for several of my frontends.

But, Alas, There Is Nothing Really Standard About It, Nor Does It
Extend To "Any Language". Usually, One Will Have To Live With A
Situation That Many Pieces Of The Syntax And Semantics Will Vary From
One Language To The Next, And So Different Frontends Would Necessarily
Produce AST's With Differing Contents And Differing Meanings.

admitted, within a narrow family of languages there is a lot of overlap, so
more can be similar than different:
for example: C, C++, Java, C#, and maybe ECMAScript (JavaScript and
ActionScript) could all use an essentially very similar AST structure.

however, once it starts comming to the problem of specific languages, the
potentially drastic semantic differences come up.

for example, if the C is still to be valid C, the Java still valid Java, and
the JS valid JS, then some pain begins, as these languages each manage
things like types, memory references, ... very differently, and eventually
these issues will need to be addressed.

in many cases, common ground can be found, and one can address some issues
via simple internal translation, but many other cases it is less trivial,
and one ends up having to use a "common superset" strategy for many parts of
the backend.

for example, one may end up dealing with maybe around 8+ different basic
array types, several different variations as to how to manage OO features
(C++ vs Java vs C# vs JS).

there may be cases where there is no single good way to do something,
leading to open-ended problems (this is an extra issue with signature
strings, since it may lead to issues like inconsistent name-mangling
behavior, extra code complexity, ...). one may also find cases of mutual
incompatibility, where neither language can directly map their data to the

in other cases, things may need to be left as context dependent or ambiguous
(for example, signature strings may have some context-dependent types and
notations, ...).

something trivial in one place may also be a terrible pain in another, ...

often, the best option available is to try to be generic (keep one thing
from depending on the specifics of another, and allow things to be passed
along cleanly and easily when possible).

but, anyways, here is a current compiler dump:

it is currently (mostly) under a mix of Public Domain and MIT licensing (and
is now GPL-free), but a few parts come from Apache (mostly the Java
classlib, but I have partly started on attempting my own implementation of
the classlib). (note: Java support is not particularly tested or

a lot is still needed WRT documenting the thing, ...

Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.