Re: Strahler numbers

"BGB / cr88192" <>
Wed, 21 Jul 2010 09:31:37 -0700

          From comp.compilers

Related articles
Strahler number and register allocation (Olaf Krzikalla) (2010-07-13)
Re: Strahler number and register allocation (Tomasz Kowaltowski) (2010-07-14)
Re: Strahler numbers (Tomasz Kowaltowski) (2010-07-15)
Re: Strahler numbers (George Neuner) (2010-07-16)
Re: Strahler numbers (Tomasz Kowaltowski) (2010-07-21)
Re: Strahler numbers (BGB / cr88192) (2010-07-21)
Re: Strahler numbers (2010-08-02)
| List of all articles for this month |

From: "BGB / cr88192" <>
Newsgroups: comp.compilers
Date: Wed, 21 Jul 2010 09:31:37 -0700
References: 10-07-014 10-07-015 10-07-018 10-07-020
Keywords: analysis, registers, history
Posted-Date: 22 Jul 2010 13:35:25 EDT

>>> [How does it compare to Sethi-Ullman numbering? -John]
>>Look at :-).
>>-- Tomasz
>>[Oh, ah, er, right. -John]
> That post doesn't explain much. AFAIK, Ershov's (1958) work
> introduced Strahler's work (1952) into computing by applying it to the
> evaluation of expression trees. Ershov used it to identify what he
> called the "register function" of an expression.
> IMO, they ought really to be called Horton numbers because Robert
> Horton first introduced them as an analysis technique for stream
> mapping in a seminal paper in 1945. Horton died shortly afterward,
> and it was Strahler who went on to apply Horton's methodology and
> develop it into a general statistical model of streams.


just my personal thoughts on this subject.

these strategies I would think would seem to have limited applicability,
namely, they would only generally directly apply in situations where:
the compiler is dealing with full expressions in lower levels of the
compiler (like, the thing has not switched to RPN or TAC or similar before
this point);
the specific types or operations may cost additional registers (say, a
certain operation on a certain type is not directly applicable to a given
CPU, and simulating this operation requires several additional temporary

another limitation could be on architectures where the number of registers
is particularly limited, and most non-trivial operations will not fit in the
CPU registers anyways.

although, many of these issues likely could be addressed, I am not certain
if this has much direct applicability to many/most compiler designs (where
the AST is long dead by the time one gets to the codegen).

admittedly, I do register allocation typically a little differently.

some older parts of my codegen used fixed-assigned registers, where there
was informal assignment of certain registers to certain tasks, and the code
would basically kick-out whatever was there, and lay claim to these
registers. although simple to implement, this strategy was a bit
"unfriendly", and didn't generally lead to good output code (although,
performance was not as bad as it would seem). as a result, most code using
this strategy has gone away (what few cases are left are mostly due to
certain CPU operations requiring certain registers).

the next development then was a slight variation:
concrete registers were used, but were referenced via handles in the
so, one would allocate a register by seeing if any were free at the moment
(there often are), and failing this, picking a register (typically by using
an index into a circular array of registers, this list being arch-specific)
and kicking out whatever is there (if possible), and using this register.

typically, to make the above strategy work well, several compiler passes
were needed, where the codegen has to essentially work in lock-step, and the
final pass actually emits code. typically, inter-pass data was via flags and
bitmaps used to mark out internal state (such as which callee-save registers
were used and need to be preserved, where conflicts or problems popped up,

more recently, code has been moving away from the above strategy as well
(the multi-pass lock-step solution is not ideal).

the more recent development is to instead start to use "tvars" (essentially,
anonymous temporary variables), which typically alias themselves to
registers, but also typically have a (temporarily) assigned location on the
stack as well (if their aliased register needs to be flushed, they are
preserved into this stack location, and may be restored later). unlike
registers, there is no particular upper bound to the number of tvars.

they share the same handle-space as registers, and so can also be passed to
most of the same functions (typically, tvars are used at present along with
the prior strategy).

a considered move is to go a little further, and abstract the tvars from
their stack location. in this case, a given tvar handle will only be used
once, and so will itself have a limited lifespan (this will avoid the
present issue of tvars seeming to pass out of existance, and then apparently
popping back into existence, but in actuality being unrelated variables).

the reason for doing this would be to allow, potentially, keeping track of
both past and future tvar-related activities, and then using them as hints
to the register allocator (rather than having it act purely in response to
the present state).

similarly, it could also be reasonable to abstract the RPN stack, pretty
much entirely, from the CPU stack, and instead treat the RPN stack as a
stack of tvars (where each push will create a new tvar). also within reason
could be to allow tvars to be immutable (either explicitly, or flagging any
which are never modified).

as-is, the RPN stack is still partly aliased to the CPU stack in terms of
ordering, but this is unecessary.

another more recent development a little higher up the chain may also open
the possibility of phasing out RPN entirely:
for other (unrelated) reasons, I have decided to start using "generalized
ASTs" (or "GASTs") as an IR, in place of the use of RPN (the code for
converting AST's into RPN, is then, oddly enough, moved into the same place
as the code for processing RPN and managing the RPN stack). this opens up
the possibility of eventually dropping use of RPN altogether.

GASTs are basically a partially language-scrubbed version of the ASTs.
normally, the compiler upper-end parses and produces ASTs with a lot of
language-specific stuff, and typically type conventions somewhat specific to
the source language (such as "char" being 8 bits in C or C++, but 16 bits in
Java and C#, as well as somewhat different usage and meanings of various
type-modifiers). in GASTs, the types are replaced by the more concrete
normalized forms (type signature strings) generally used elsewhere. likely
any particularly source-language specific constructs can also be translated
into a more "neutral" form at this stage.

the reason for considering creating and using GASTs, instead of an RPN-based
IL, is because a GAST is a little bit more neutral of the compiler's (or an
interpreter's) internal structure (whereas RPN faces one with more matters
of ordering, type-handling, ... than does an AST).

the conversion from AST to GAST also involves using the "reducer" (dunno if
there is a better term, this is just what I call it internally), which
basically tries to rewrite more complex expressions into simpler ones
(although in a very high-level and general way, as it does not attempt any
target-specific optimizations).

examples of reducer operations:
evaluating constant expressions;
algebraic simplification of known-safe expressions;
eliminating if/else branches if the conditional is constant;
eliminating "while(0) {...}" blocks and similar;

some things are deliberately not done here:
loop unrolling, inlining, ... (these would be left for a later stage).

previously, the reducer was used mostly as a first-stage optimizer prior to
producing IL, but in this case also serves the partial role of assisting
with generating GAST's.

but, OTOH, I did end up using XML for GAST's (since that is what my
upper-end uses internally), which does further plague other parts of the
compiler with XML. although, I am using a binary serialization (vaguely
similar to WBXML), rather than plaintext, for moving said XML (it also still
leaves open the possibility of alternative parsers which don't create
internal XML node-trees).

the main alternative is S-Expressions, but there are relative merits to both
(s-exps are faster, more compact, and a little easier to work with, but XML
tends to be a little more flexible IME, whereas to do similar in s-exps
would likely require "unorthodox" s-exps and potentially a higher overhead
than XML).

I had also considered table-driven (representing IL as a big database-like
structure, like MSIL but taken further in that the "bytecode" is also
table-driven) and TLV-based structures (vaguely similar to Matroska), as
well as a single table of "objects" (like in JBC or many persistent stores),
but these generally posed somewhat greater issues. binary XML was the
simplest and most flexible option, even if not necessarily the highest
performance option.

but, now, will Strahler/... numbers likely matter to register allocation,
who knows, probably not.

likely they would require a fairly simplistic compiler for an architecture
with an acceptably large number of registers (as to be able to handle
complete expressions).

something like predicting the number of temporary variables is possible, but
I would suspect of more limited usefulness. I don't know...

Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.