Re: xml as intermediate representation

"Vidar Hokstad" <>
23 Sep 2005 15:54:28 -0400

          From comp.compilers

Related articles
xml as intermediate representation (tanuj) (2005-09-17)
Re: xml as intermediate representation (=?ISO-8859-1?Q?J=FCrgen_Kahrs?=) (2005-09-17)
Re: xml as intermediate representation (Jeff Kenton) (2005-09-22)
Re: xml as intermediate representation (TOUATI Sid) (2005-09-22)
Re: xml as intermediate representation (Chris Dollin) (2005-09-23)
Re: xml as intermediate representation (Vidar Hokstad) (2005-09-23)
Re: xml as intermediate representation (Alexey Demakov) (2005-09-27)
Re: xml as intermediate representation (Vidar Hokstad) (2005-09-27)
Re: xml as intermediate representation (TOUATI Sid) (2005-09-30)
| List of all articles for this month |

From: "Vidar Hokstad" <>
Newsgroups: comp.compilers
Date: 23 Sep 2005 15:54:28 -0400
Organization: Compilers Central
References: 05-09-07805-09-101
Keywords: analysis, optimize
Posted-Date: 23 Sep 2005 15:54:28 EDT

Jeff Kenton wrote:
> tanuj wrote:
> > I am writing a compiler as a part of my course project and wanted to
> > use xml as an intermediate representation language. All the coding is
> > being done in C.

For C I'd look at libxml for the XML parsing. See Also, consider taking a look at - it's a programming language that uses XML as
its native form, and may give you some ideas on how to structure

> Why use XML? The only value I can see is that it's easy to eyeball
> your IR, and you could get that by writing a routine to dump the IR
> in some convenient format (which you should have anyway).

- XML can be manipulated by a vast array of tools.

- XSL can be used to write transformations quickly and easily whether
to test things or to provide additional functionality. Including to
dump the IR in a more human readable format without having to hardwire
the format in the compiler.

- Static analysis tools and a vast array of other tools become far
easier to write because there's a huge toolchain to assist in
manipulating the data.

- Writing tools for automated instrumentation or for implementing
things like aspects separate from the core language by transforming the
IR is trivial.

- Easy to build the compiler as a set of independent filtering modules
and replace any one of them without recompiling, or even prototyping
modules in other languages or hand writing test input for specific
passes. Works great for unit testing specific modules without having
to write tons of test code to build the IR structures manually (of
course you could write a custom parser, but why do that when XML gives
it to you for free?).

> But the drawbacks to using XML far outweigh the minor convenience.
> First, it's too verbose for anything but toy programs. Second, if
> you are planning to manipulate it in C, you're going to need a
> corresponding C representation for it internally.

Why does it matter if it's verbose? And verbosity mostly comes from
poor choice in how you organise your XML. Have you ever looked at the
RTL dumps from GCC? Making an XML representation as terse (if you can
call 100KB output on average per pass for a 6KB C source terse... But
of course it's a very low level representation) is hardly difficult.

As for representation - he'll need a corresponding C representation
regardless what format he uses to dump it/parse it. Using XML gives
him large parts of the code to parse it for free, and if he's
comfortable with wrapping his code around W3 DOM/DOM like tree
manipulation he can avoid a lot of the work - I've used that approach
for small languages successfully in the past.

> I would guess you chose XML because you want to play with XML. That
> suggests that you really want two projects: a compiler project,
> which you should do with appropriate tools, and an XML project
> that's completely separate.

I absolutely don't agree. XML is great to use as a way of breaking up
a compiler, particularly for the reasons above (i.e. it's easy to
manipulate and there's a huge set of tools to work with). It's a great
boon to tool writers to have easy access to the internal data of a
compiler in a format that is easily accessible and doesn't require
them to write custom parsers or even mess around with the compiler

[I agree that if you're going to pass stuff from phase to phase, XML
is as good a way to do it as any because of all the tools. But I
have my doubts about plug and play phases. The compilers I wrote
depend on shared data structures like symbol tables, and if you're going
to have independent phases, you're going to have to pass that shared
data from phase to phase, too, greatly bulking up the process. -John]

Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.