Related articles |
---|
How to store an AST salvador@ucseii.edu.ar (Salvador V. Cavadini) (1999-04-30) |
Re: How to store an AST madings@baladi.nmrfam.wisc.edu (Steve Mading) (1999-05-07) |
Re: How to store an AST drh@microsoft.com (Dave Hanson) (1999-05-07) |
Re: How to store an AST mph@zdenka.demon.co.uk (Mike Harrison) (1999-05-09) |
Re: How to store an AST nmh@dialup.nacamar.de (Nils M Holm) (1999-05-16) |
Re: How to store an AST WN.Jacobs@net.HCC.nl (Willy Jacobs) (1999-05-16) |
From: | Steve Mading <madings@baladi.nmrfam.wisc.edu> |
Newsgroups: | comp.compilers |
Date: | 7 May 1999 01:12:01 -0400 |
Organization: | University of Wisconsin, Madison |
References: | 99-04-120 |
Keywords: | design |
Salvador V. Cavadini <salvador@ucseii.edu.ar> wrote:
: Hi,
: I'm programming a front-end for a tool and I need to store in disk the
: AST produced by the parser. The rest of the tool will randomly access
: the AST's nodes. I would be greatful if someone can recommend an
: efficient way for store the AST.
For the most part, you can just dump out the structures as-is with
binary read/write but you need to be careful of pointers. Any time
you have some data that is pointed at by a pointer somewhere, you
should assume that the data will be in a different memory location
when you reload it later. This invalidates the pointers that pointed
at it.
So, to handle pointers, you have to have some kind of translation
scheme to turn the pointers into some other kind of 'handle'. This is
where things get very ugly. There are two basic kinds of solution:
1 - Use a scheme that is aware of the AST structure.
2 - Use a generic scheme that can handle any kind of pointer.
Scheme 1 is simpler to implement, but dependant upon the AST layout.
(For example, if you know that the *only* pointers in your AST are the
ones that link nodes together, you could write out the AST nodes in a
depth-first fashion, with special delimiters to tell it when to pop
'up' a depth level. Then you know how to re-link the node pointers
later when you load it.)
Scheme 2 is more complex to implement, but allows for any future
expansion, and handles all the pointers in your AST, not just the
pointers to nodes.
1 - Build a 'reference table' in memory as you write out the AST to the
file. This table maps each pointer variable to a byte offset in the
file.
2 - At the start of the file, prepend the 'reference table'.
3 - When loading, copy the reference table into memory, and then
populate the third column of it as you go (see below).
Example of the reference table (I'm just making the numbers up):
----------------------------------------------------------------
What is now Originally was New location when
at this byte at this memory loaded into memory.
offset in location in (populated as the nodes
the file: RAM: get loaded from the file)
------------- -------------- -------------------------
0x0 0x00908f01 0x00ba9015
0x40 0x00908f46 0x00ba9051
0x91 0x0090b416 0x00ba9a12
0xb4 0x0090e100 0x00ba9fa0
... etc ....
4 - When loading, each time there is a pointer in the structure,
replace its contents with the new memory location for the item that was
pointed at. This is what the 'reference table' can help with. For
each pointer address, look it up on the reference table and replace it
with the new address.
5 - To really make this truly generic for *any* pointer, you also need
to keep track of where the pointers variables in the file were at. (so
the loading algorithm doesn't need to know ahead of time which things
are pointers to be replaced and which are data to be left alone.) I
didn't show how to do that here. It could be done with a second table
similar to the reference table.
(It's actually a lot more work than that, but I'm just giving an
overview. This sort of thing is 'icky' and I often wonder why there
isn't a standard library routine for dumping complex memory structures
to disk.)
(Another consideration to worry about is cross-platform type
conversions. If you know that the hardware and OS that writes the
file will always be the same as the one that reads the file, then
everything is okay. If not, then you have to start worrying about
sizes of integers, byte ordering, and all those other annoyances.
This is true in general for any sort of 'dumb the structs' kind of
solution.)
Return to the
comp.compilers page.
Search the
comp.compilers archives again.