Re: analyzing dependencies in code

idbaxter@semdesigns.com
Sat, 17 Nov 2007 09:39:52 -0800 (PST)

          From comp.compilers

Related articles
analyzing dependencies in code eliben@gmail.com (eliben) (2007-11-16)
Re: analyzing dependencies in code idbaxter@semdesigns.com (2007-11-17)
Re: analyzing dependencies in code kamalpr@gmail.com (IndianTechie) (2007-11-17)
Re: analyzing dependencies in code SidTouati@inria.fr (Sid Touati) (2007-12-13)
| List of all articles for this month |
From: idbaxter@semdesigns.com
Newsgroups: comp.compilers
Date: Sat, 17 Nov 2007 09:39:52 -0800 (PST)
Organization: Compilers Central
References: 07-11-047
Keywords: analysis
Posted-Date: 18 Nov 2007 01:07:11 EST

On Nov 16, 2:42 am, eliben <eli...@gmail.com> wrote:
> Hello,
>
> I have a bunch of rather convoluted C code that uses a lot of global
> variables. I'm trying to see if I can devise some automatic way of
> analyzing this code, helping me understand it.
> Suppose I can obtain the AST for this code (using c2c). I want to
> analyze the AST and, given some global variable V, understand which
> other global variables it affects, and what functions can be called as
> a result of a change in it.
>
> I tried approaching this the ad-hoc way, but it's becoming
> complicated. I see some parallels to dataflow analysis here, though it
> seems to be a bit different.
>
> Any ideas of proven methods to do this right ?


If you want to *see* which variables are affected/ what code
executes based on changes to global variable, what you want
is the set of "forward slices" from assignments
to that specific global.
[Obviously "seeing" isn't understanding, that's left to the
reader :-( ]


A forward slice generally defined as the set of statements dependent
on an assignment (any update) to a variable at a specific point in the
source code. [A backward slice is the set of statements that can
affect the value of a variable at a specific point in the source
code].


Computing such slices requires pretty significant machinery. Clearly
you need to parse the program; obtaining an AST is a first step. Then
you need to find all the global variables; I assume you mean across
*all* the compilation units in your source code. Arguably you need a
symbol table for this, although you can probably hack this looking
just for the special syntax cases. But you'll need full control and
data flow analysis (forcing the need for a symbol table) to determine
the downstream (respectively upstream for backward slices)
dependencies. More importantly, because C variables can be affected
by indirect assignments via pointers, you'll need a global (full
application) points-to analysis. These techniques are all well
documented in the compiler literature; in fact, there are a wide
variety of them based on ambition for accuracy or speed. That doesn't
mean you can implement them easily, as you have to take into account
all the vagaries of the C language semantics [including the
dialect-specific issues for your particluar C compiler] and handle the
scale issues that come with doing whole-program analysis.


An older but free implementation of slicing can be found at
http://hissa.nist.gov/unravel/. I suspect it only handles older
dialects of C, as I think that work is dormant. Grammatech offers an
interactive slicing tool
www.grammatech.com/research/slicing/slicingWhitepaper.html as a
product. You give it your source code, graphically point to a place
in the code (e.g., an assignment or a reference to a varible), and say
"slice"; it lights up the code that forms the slice. The way you
defined the problem, you want the *set* of all slices; I don't know if
the Grammatech will do that, but I'd be surprised if they hadn't heard
requests for that.


If you still want to do it yourself, but don't want to build all the
foundation machinery needed, our DMS Software Reengineering toolkit is
used to construct custom tools, It can be obtained with full C parsers
for a variety of dialects of C, and offers APIs for computing control,
dataflow, and points-to analyses, and has been used on whole-system
analyses of C systems with a single load image involving 12 million
lines of code (e.g, some 10000 compilation units, 30000 files, etc.)
Yes, the scale problems are tough.


Ira Baxter, CTO
Semantic Designs


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.