Re: How to store an AST

Steve Mading <>
7 May 1999 01:12:01 -0400

          From comp.compilers

Related articles
How to store an AST (Salvador V. Cavadini) (1999-04-30)
Re: How to store an AST (Steve Mading) (1999-05-07)
Re: How to store an AST (Dave Hanson) (1999-05-07)
Re: How to store an AST (Mike Harrison) (1999-05-09)
Re: How to store an AST (Nils M Holm) (1999-05-16)
Re: How to store an AST (Willy Jacobs) (1999-05-16)
| List of all articles for this month |

From: Steve Mading <>
Newsgroups: comp.compilers
Date: 7 May 1999 01:12:01 -0400
Organization: University of Wisconsin, Madison
References: 99-04-120
Keywords: design

Salvador V. Cavadini <> wrote:
: Hi,
: I'm programming a front-end for a tool and I need to store in disk the
: AST produced by the parser. The rest of the tool will randomly access
: the AST's nodes. I would be greatful if someone can recommend an
: efficient way for store the AST.

For the most part, you can just dump out the structures as-is with
binary read/write but you need to be careful of pointers. Any time
you have some data that is pointed at by a pointer somewhere, you
should assume that the data will be in a different memory location
when you reload it later. This invalidates the pointers that pointed
at it.

So, to handle pointers, you have to have some kind of translation
scheme to turn the pointers into some other kind of 'handle'. This is
where things get very ugly. There are two basic kinds of solution:

1 - Use a scheme that is aware of the AST structure.
2 - Use a generic scheme that can handle any kind of pointer.

Scheme 1 is simpler to implement, but dependant upon the AST layout.
(For example, if you know that the *only* pointers in your AST are the
ones that link nodes together, you could write out the AST nodes in a
depth-first fashion, with special delimiters to tell it when to pop
'up' a depth level. Then you know how to re-link the node pointers
later when you load it.)

Scheme 2 is more complex to implement, but allows for any future
expansion, and handles all the pointers in your AST, not just the
pointers to nodes.
      1 - Build a 'reference table' in memory as you write out the AST to the
      file. This table maps each pointer variable to a byte offset in the
      2 - At the start of the file, prepend the 'reference table'.
      3 - When loading, copy the reference table into memory, and then
      populate the third column of it as you go (see below).

      Example of the reference table (I'm just making the numbers up):
            What is now Originally was New location when
            at this byte at this memory loaded into memory.
            offset in location in (populated as the nodes
            the file: RAM: get loaded from the file)
          ------------- -------------- -------------------------
            0x0 0x00908f01 0x00ba9015
            0x40 0x00908f46 0x00ba9051
            0x91 0x0090b416 0x00ba9a12
            0xb4 0x0090e100 0x00ba9fa0
            ... etc ....

      4 - When loading, each time there is a pointer in the structure,
      replace its contents with the new memory location for the item that was
      pointed at. This is what the 'reference table' can help with. For
      each pointer address, look it up on the reference table and replace it
      with the new address.

      5 - To really make this truly generic for *any* pointer, you also need
      to keep track of where the pointers variables in the file were at. (so
      the loading algorithm doesn't need to know ahead of time which things
      are pointers to be replaced and which are data to be left alone.) I
      didn't show how to do that here. It could be done with a second table
      similar to the reference table.

(It's actually a lot more work than that, but I'm just giving an
overview. This sort of thing is 'icky' and I often wonder why there
isn't a standard library routine for dumping complex memory structures
to disk.)

(Another consideration to worry about is cross-platform type
conversions. If you know that the hardware and OS that writes the
file will always be the same as the one that reads the file, then
everything is okay. If not, then you have to start worrying about
sizes of integers, byte ordering, and all those other annoyances.
This is true in general for any sort of 'dumb the structs' kind of

Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.