Interpreted Basic control-flow analysis

"Arthur J. O'Dwyer" <ajonospam@andrew.cmu.edu>
5 Jul 2006 15:19:19 -0400

From comp.compilers

Related articles
*Interpreted Basic control-flow analysis ajonospam@andrew.cmu.edu (Arthur J. O'Dwyer)* (2006-07-05)**
Re: Interpreted Basic control-flow analysis DrDiettrich@compuserve.de (Hans-Peter Diettrich) (2006-07-06)
Re: Interpreted Basic control-flow analysis chris.dollin@hp.com (Chris Dollin) (2006-07-06)
Re: Interpreted Basic control-flow analysis ajonospam@andrew.cmu.edu (Arthur J. O'Dwyer) (2006-07-06)

| List of all articles for this month |

From:	"Arthur J. O'Dwyer" <ajonospam@andrew.cmu.edu>
Newsgroups:	comp.compilers
Date:	5 Jul 2006 15:19:19 -0400
Organization:	Carnegie Mellon, Pittsburgh, PA
Keywords:	analysis, question
Posted-Date:	05 Jul 2006 15:19:19 EDT

[multi-posted to moderated comp.compilers, unmoderated comp.programming]

I have an apparently non-trivial problem in control-flow analysis.
I'm trying to make a static control-flow analyzer for programs written
in TI-83 Basic, an interpreted language. The input is a Basic program,
line by line; and the output is a mapping from lines to lists of their
successors. (See example below.)

      The important control structures supported by TI-83 Basic are
If-Then-Else-End, While-End, and Goto-Label.

      At runtime, the interpreter maintains a stack of all the currently
enclosing control structures. A new entry is pushed when encountering
If-Then or While, and popped when encountering End. Notice that Goto
does not affect the state of the control stack --- this is the crux
of the problem! The runtime nesting of control structures isn't
necessarily reflected by a lexical nesting.

      The successors of If-Then, Else, While, and Goto are always
lexically determined[1] and thus easy to compute. The successors of
"End" are what I'm having trouble with.

      Consider the following trivial example, in which the lexical nesting
does reflect the runtime semantics:

          1. If X successors are 2
          2. Then " 3, 8
          3. While Y " 4, 6
          4. A " 5
          5. End " 4, 6
          6. B " 7
          7. End " 8
          8. C " none

      Now consider the following example, in which the lexical nesting
doesn't quite match the runtime behavior:

          1. While X " 2, 4
          2. Goto L " 6
          3. End " unreached
          4. While Y " 5, 7
          5. Lbl L " 6
          6. End " 2, 5
          7. A " none

The "End" in line 6 corresponds to "While X" if it was reached
via line 2 ("Goto L"), but it corresponds to "While Y" if it was
reached via line 4 (which is where control ends up if X is false
the first time line 1 is reached).

      I'm looking for an algorithm that can handle this kind of language.
I want the resulting successor list to be as conservative as possible
--- e.g., it is not acceptable to give every line in the program as
the successor list of every End. (I don't care what's given for
unreachable lines like line 3 above, though.)

      I've thought of recursively simulating a run through the program,
taking both branches at each decision point, and keeping control
stacks for each thread so I always know where each "End" goes. But
that needs a way to detect and avoid infinite loops. The only way
I've thought of is to avoid duplicating any (line number, control
stack) state; but that involves keeping arbitrarily large amounts
of data, and still doesn't deal with the following program, which
keeps expanding the control stack forever since it never closes any
of its "While" loops.

          1. Lbl L " 2
          2. While X " 3
          3. Goto L " 2

      Any ideas?

      (Possibly useful notes: I'm writing the analyzer in C. Parsing
lines to see whether they're "If..." or "While..." is easy; parsing
expressions is harder than usual, and I'd prefer not to do it at
all, for the time being. TI-83 Basic also has control structures
that let you skip exactly one line, or "hide" a line from the
lexical thingie[1] but have it run at runtime. I don't think they're
relevant to the problem, but I'll explain more if asked.)

-Arthur

[1] - The two [1]s refer to the same thingie.

[Early versions of Fortran permitted something like this known as the
extended range of a DO loop, in which you jumped out and then back in
and continued the loop. There might be some useful work there.

Failing that, I'd suggest pattern matching to look for some relatively
easy to analyze jump structures and other than that, punt. I wouldn't
be surprised if you could come up with a small set of patterns that
match all of the useful programs with jumps out of loops, and all the
programs you can't match don't work anyway. -John]

Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.

Interpreted Basic control-flow analysis

"Arthur J. O'Dwyer" <ajonospam@andrew.cmu.edu>5 Jul 2006 15:19:19 -0400

"Arthur J. O'Dwyer" <ajonospam@andrew.cmu.edu>
5 Jul 2006 15:19:19 -0400