|Interpreted Basic control-flow analysis email@example.com (Arthur J. O'Dwyer) (2006-07-05)|
|Re: Interpreted Basic control-flow analysis DrDiettrich@compuserve.de (Hans-Peter Diettrich) (2006-07-06)|
|Re: Interpreted Basic control-flow analysis firstname.lastname@example.org (Chris Dollin) (2006-07-06)|
|Re: Interpreted Basic control-flow analysis email@example.com (Arthur J. O'Dwyer) (2006-07-06)|
|From:||Hans-Peter Diettrich <DrDiettrich@compuserve.de>|
|Date:||6 Jul 2006 07:51:29 -0400|
|Posted-Date:||06 Jul 2006 07:51:29 EDT|
"Arthur J. O'Dwyer" wrote:
> I have an apparently non-trivial problem in control-flow analysis.
> I'm trying to make a static control-flow analyzer for programs written
> in TI-83 Basic, an interpreted language. The input is a Basic program,
> line by line; and the output is a mapping from lines to lists of their
> successors. (See example below.)
> The important control structures supported by TI-83 Basic are
> If-Then-Else-End, While-End, and Goto-Label.
> At runtime, the interpreter maintains a stack of all the currently
> enclosing control structures. A new entry is pushed when encountering
> If-Then or While, and popped when encountering End. Notice that Goto
> does not affect the state of the control stack --- this is the crux
> of the problem!
Please specify: do you describe the operation of your analyzer, or the
operation of the original interpreter?
> The runtime nesting of control structures isn't
> necessarily reflected by a lexical nesting.
AFAIK most BASIC interpreters embed links to related code places,
whenever a jump is required (at the end of a loop, before an ELSE...),
and do not rely on any kind of control stack. These references are
updated before the start of a program, based on a static analysis of the
> The successors of If-Then, Else, While, and Goto are always
> lexically determined and thus easy to compute. The successors of
> "End" are what I'm having trouble with.
IMO the trouble only arises from your misinterpretation of the successor
of the END of a WHILE loop. It's always the WHILE node with the loop
> Consider the following trivial example, in which the lexical nesting
> does reflect the runtime semantics:
> 1. If X successors are 2
> 2. Then " 3, 8
I'd merge (1) and (2) into an single BB.
> 3. While Y " 4, 6
> 4. A " 5
> 5. End " 4, 6
(5) only has the successor (3) - the loop test!
> 6. B " 7
> 7. End " 8
> 8. C " none
> Now consider the following example, in which the lexical nesting
> doesn't quite match the runtime behavior:
> 1. While X " 2, 4
> 2. Goto L " 6
> 3. End " unreached
In how far does it matter, whether (3) is reachable or not?
Unreachable nodes can be removed from the CFG (recursively!), e.g.
by/after a DFS for the assignment of node numbers.
> 4. While Y " 5, 7
> 5. Lbl L " 6
> 6. End " 2, 5
As above, (6) only has the successor (4).
> 7. A " none
> The "End" in line 6 corresponds to "While X" if it was reached
> via line 2 ("Goto L"), but it corresponds to "While Y" if it was
> reached via line 4 (which is where control ends up if X is false
> the first time line 1 is reached).
IMO your only mistake (so far ;-) is the wrong assumption about the
successor of the END of a WHILE loop.
> I'm looking for an algorithm that can handle this kind of language.
> I want the resulting successor list to be as conservative as possible
> --- e.g., it is not acceptable to give every line in the program as
> the successor list of every End. (I don't care what's given for
> unreachable lines like line 3 above, though.)
IMO chances for hitting really structured code are almost the same, for
a BASIC language with unrestricted GOTOs, and for assembly language.
I'd suggest you study the documentation around the dcc (C decompiler) by
Cristina Çifuentes, for techniques in handling and reduction of regular
and irregular control flow graphs. (Universities of Tasmania and
I could find only one too simplified transformation in her algorithms,
dealing with the short-circuit evaluation of complex conditional
expressions; this is not really an error, I assume that she simply had
no time left for a more general analysis ;-)
> (Possibly useful notes: I'm writing the analyzer in C.
dcc also is written in C, perhaps you can reuse parts of it. IIRC the BB
structure is a nightmare, but after stripping it down, the code for the
creation, analysis and reduction of CFGs might be reusable. I translated
the major part of dcc into Delphi/Pascal, introducing classes for the
graphs, nodes etc. - you may want to use C++ for similar reasons.
I wrote my first decompiler in GfA-Basic. A good interpreted language
can offer many goodies for debugging, like to edit the code at runtime,
to invoke subroutines manually, and to resume in a different place. But
in your case you also can go without such features.
[Some Basic interpreters do a static prepass, some just chug through
the program and keep a stack of open control structures. This one
appears to be of the latter variety. -John]
Return to the
Search the comp.compilers archives again.