Re: Control Flow Analysis for Languages with Exceptions (e.g. Java)

Hans-Peter Diettrich <DrDiettrich1@aol.com>
5 Nov 2006 23:35:52 -0500

          From comp.compilers

Related articles
Control Flow Analysis for Languages with Exceptions (e.g. Java) owong@castortech.com (Oliver Wong) (2006-11-04)
Re: Control Flow Analysis for Languages with Exceptions (e.g. Java) DrDiettrich1@aol.com (Hans-Peter Diettrich) (2006-11-05)
Re: Control Flow Analysis for Languages with Exceptions (e.g. Java) ajonospam@andrew.cmu.edu (Arthur J. O'Dwyer) (2006-11-05)
Re: Control Flow Analysis for Languages with Exceptions (e.g. Java) int2k@gmx.net (Wolfram Fenske) (2006-11-05)
Re: Control Flow Analysis for Languages with Exceptions (e.g. Java) usenet@gmx.info (Michael Klemm) (2006-11-15)
| List of all articles for this month |

From: Hans-Peter Diettrich <DrDiettrich1@aol.com>
Newsgroups: comp.compilers
Date: 5 Nov 2006 23:35:52 -0500
Organization: Compilers Central
References: 06-11-017
Keywords: analysis, errors
Posted-Date: 05 Nov 2006 23:35:52 EST

Oliver Wong wrote:
> I'm a novice to compiler theory, and I've been doing some reading on
> control flow analysis.
>
> It looks like the literature all agrees that the first step to
> CFA, once you have the AST, is to determine what the basic blocks are
> for your code. From what I understand a basic block is a sequence of
> instructions such that if any one of those instructions are executed,
> then all the instructions in that sequence are executed.


I'd rephrase this statment:
All instructions in a BB can execute only after entering the block, i.e.
after the first instruction has been invoked.




> In the case of Java (and probably other languages with exception,
> though I don't have experience with any of them), every instruction
> would be its own basic block, because an exception could potentially
> be thrown at any point, thus guaranteeing that there does not exist
> any pair of points such that you could be certain that the second
> instruction would execute, given that the first was executed.


IMO special handling of SEH in CFA is required, depending on the goal of
the analysis. If only the components, loops etc. are explored, normal
operation (without exceptions) can be assumed, exception handlers can be
dropped as dead code, and Try/Finally blocks can be merged into single
blocks (at least in the next steps of the reduction of the CFG).


When exceptions are taken into account, all finally blocks and exception
handlers can be entered from every instruction in the according try
block, with according handling of the SSA Phi nodes. Synthetic control
flow paths must be added, from the end of finally blocks, and from
exception blocks in the case of unhandled exceptions, to the end of the
subroutine or to the next enclosing except/finally block.




> As a bonus question: A lot of my interest in this is due to my
> trying to understand how the control flow analysis for the FindBugs
> software works (http://sourceforge.net/projects/findbugs). FindBugs
> defines a class called Location, and puts in the documentation:
>
> <quote>
> Because of JSR subroutines, the same instruction may actually
> be part of multiple basic blocks (with different facts
> true in each, due to calling context)
> </quote>
>
> This surprised me quite a bit. In all of the literature I've seen
> thus far, instructions should only belong to a single basic block at
> one time, so I'm trying to understand what deviations from the
> literature FindBugs did to implement CFA in Java.


As mentioned above, CFA may serve several purposes. A static analysis
usually is performed on, and limited to, single subroutines, without
taking into account called subroutines (the calls become single
instructions), and ignoring control flow modified by exceptions.
Sometimes subroutine calls terminate an BB, so that special cases (no
return, return to other locations...) can be reflected in the CFG.


The above analysis seems to be a more global one, taking into account
side effects in called subroutines and (possibly) exceptions. IMO the
analysis of such an resulting monster CFG, with all called subroutines
embedded, must impose specific criteria, in order to extract usable
information from such a bunch of informations.




> So concretely, I'm more interested in this second "bonus"
> question, but if you are unfamiliar with FindBugs, I suspected my more
> abstract first question might be easier to answer, and would probably
> lead me in the right direction.


I'm not familiar with FindBugs, my concern till now only was the
(re)construction of the static structure of subroutines.


DoDi


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.