Re: What's lacking: a good intermediate form

"cr88192" <cr88192@hotmail.com>
Sat, 7 Mar 2009 09:03:39 +1000

          From comp.compilers

Related articles
[21 earlier articles]
Re: What's lacking: a good intermediate form tony@my.net (Tony) (2009-03-05)
Re: What's lacking: a good intermediate form pertti.kellomaki@tut.fi (Pertti Kellomaki) (2009-03-06)
Re: What's lacking: a good intermediate form jon@ffconsultancy.com (Jon Harrop) (2009-03-06)
Re: What's lacking: a good intermediate form bartc@freeuk.com (Bartc) (2009-03-06)
Re: What's lacking: a good intermediate form comp.lang.misc@inglorion.net (Robbert Haarman) (2009-03-06)
Re: What's lacking: a good intermediate form tony@my.net (Tony) (2009-03-06)
Re: What's lacking: a good intermediate form cr88192@hotmail.com (cr88192) (2009-03-07)
Re: What's lacking: a good intermediate form tony@my.net (Tony) (2009-03-06)
Re: What's lacking: a good intermediate form max@gustavus.edu (Max Hailperin) (2009-03-07)
Re: What's lacking: a good intermediate form pertti.kellomaki@tut.fi (Pertti Kellomaki) (2009-03-09)
Re: What's lacking: a good intermediate form pertti.kellomaki@tut.fi (Pertti Kellomaki) (2009-03-09)
Re: What's lacking: a good intermediate form bobduff@shell01.TheWorld.com (Robert A Duff) (2009-03-10)
Re: What's lacking: a good intermediate form bartc@freeuk.com (Bartc) (2009-03-11)
[6 later articles]
| List of all articles for this month |

From: "cr88192" <cr88192@hotmail.com>
Newsgroups: comp.lang.misc,comp.compilers
Date: Sat, 7 Mar 2009 09:03:39 +1000
Organization: albasani.net
References: 09-02-132 09-02-136 09-02-144 09-03-003 09-03-014 <fRhrl.13260$8_3.4266@flpi147.ffdc.sbc.com> 09-03-020 09-03-025
Keywords: code, analysis
Posted-Date: 06 Mar 2009 21:23:00 EST

"Tony" <tony@my.net> wrote
> "Robbert Haarman" <comp.lang.misc@inglorion.net> wrote in message
>> [Cross-posting so that everybody who followed the original thread gets
>> the message too.]
>>
>> I also feel that the world would benefit from a good common
>> intermediate language. In fact, I think it would be good to have a
>> number of them, at different levels and with different features. For
>> myself, and for everybody who is interested in using them, I am
>> developing at least two such languages.
>
> Note that I said "intermediate form" rather than "intermediate
> language". "Intermediate representation" (IR) is probably more common
> than "intermediate form" though. An intermediate language is generated
> code and a bit farther down the line than the IR I was concerned with,
> which is probably an AST.
>


ok:
I have usually used both S-Exps and XML/DOM trees for representing parsed
code, and each has advantages and disadvantages.




S-Exps:
fairly easy to work with (provided good API support);
are relatively efficient (both in terms of time and memory use);
one can support hetrogeneous data if need-be;
...


however:
they can tend to become rigid/inflexible;
common usage patterns make manual memory management problematic (ie: lots of
garbage and trees that can't be safely freed);
...




DOM:
very good at annotation and compatible extension (AKA: it is easy to add new
things without breaking existing code);
namespaces can come in rather useful sometimes;
manual memory management is usually easier;
...


however:
working with them (in C) tends to add a lot of bulk to the code, and it is
not so easy to effectively wrap them by abstract API calls;
there are not so efficient with memory (ie: they use a lot of extra time and
memory);
...




so, for years I have been going back and forth between them.
it is almost more desirable to come up with a custom notation /
representation, which like S-Exps is fairly memory dense and easy to work
with in C, but like DOM readily supports annotation and other features.




alternatively, one could make an alternative (non-DOM) representation for
XML nodes, and allow hetrogeneous mixing with S-Expressions and other data
(an alternative syntax could be developed as well).


<name attr="value" ... />


creates an "x-node", which is like an XML-node.


<foo key="stuff"/>


likewise:
<foo key="stuff">3 4 "bar"</>
<bar>(add (mul a b) c)</>


(add <ref name="x"/> 3)
...


alternative possible tweak (more to normalize the syntax and make less
likely to confuse with XML):
<name: ... >


so:
<foo key="stuff": 3 4 "bar" (baz x y)>


likely, whitespace would be required after the colon to distinguish it from
namespaces (and context would be used to separate both from keywords).




likewise, the internal representation need not be cons cells (although I
would probably do so in my case because it would be most convinient).


note that this would not be parser-compatible with either XML or S-Exps (the
former being because non-nodes are expressions rather than textual payload,
and the latter because '<', '>', '=' ... are usually regarded as valid name
characters, but this would require limiting names, possibly down to C-style
naming rules, in order to free up the characters for syntactic use).


likely (in my case), this could be implemented primarily by making a special
object type for the nodes, which would hold the attributes and use a list
for the contents.


XNode {
symbol ns, name;
list attr, body;
}


...




it is just unclear exactly how much to "generalize" this (for example, could
attributes be made a common syntactic element, ...).
likewise, it would require developing clear "usage conventions", for
example, noting that S-Exps and XML are traditionally used differently
(S-Exps are usually positional, whereas XML usually uses sub-tag names), ...




other thoughts would be to have something analogous to CDATA, but with a
lighter syntax, ...
[[ text ]]


where these braces could be allowed within the text so long as they are
evenly matched, but otherwise would be escaped (mental uncertainty over
ideal escape sequences...). a possible tweak is to also allow merging
adjacent blocks (sort of like C strings). hmm: '#\[[', '#\]]' escape braces,
'#\\' escape self, '#\uXXXX' and '#\UXXXXXXXX' escape unicode, '#\xXX'
escape ASCII.


could also use C-style comments rather than either XML or Lisp style
comments.


...




of course, all this creates a different pros and cons list than either XML
or S-Exps...




thoughts?...


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.