27 Dec 1998 01:06:44 -0500

Related articles |
---|

Cyclomatic Complexity using GCC lundsgas@ssd.comm.mot.com (Soren Lundsgaard) (1998-12-27) |

From: | Soren Lundsgaard <lundsgas@ssd.comm.mot.com> |

Newsgroups: | comp.compilers |

Date: | 27 Dec 1998 01:06:44 -0500 |

Organization: | Compilers Central |

Keywords: | optimize, GCC |

To: comp.compilers moderated newsgroup and egcs mailing list.

The following is a letter I sent to Jim Willson at Cygnus about some

Cyclomatic Complexity work I did here at Motorola.

I have included some notes from earlier correspondence with Jim Wilson

to shed more light on this information. These are preceeded with '> '.

Could someone in the egcs mailing list work to implement this

computation in gcov? Are there other metrics that could be gleaned

from GCC using similar mechanisms?

-------------

Dear Jim Wilson,

I did a bit of work on my employer's time to investigate using the gcc

version 2.8.1 .bb and .bbg basic branch graph files to possibly

compute cyclomatic complexity of modules. I have very positive

results to report, computations of cyclomatic complexity are not only

possible, but quite easy to do using these files, with some changes to

gcov.c

I have sent out a query on our company wide newsgroup about turning over

the changes to the egcs project. I received no positive responses except

one person who mentioned that there were probably groups in Motorola who

contributed the port of gcc to one of our processors.

Due to the perceived difficulty of getting our legal department to sign

off on releasing the code changes to gcov, I would like to describe my

research and the resulting algorithm, which I do not believe could be

construed to be proprietary information, as it is derived from information

emanating from outside the corporation, together with obvious conclusions

based on that information.

As you recall, I queried you about the flags in the .bbg file, and you

described to me their different meaning. I had developed software (C

programs and an (n)awk script) to emit the graph based on the .bb and .bbg

files in a format suitable for processing by a program which is adept at

displaying directed graphs (dot, part of graphviz from AT&T). When I

examined the resulting graph without edges that included the "fake" flag,

it became obvious that this graph represented the "classic" graph of flow

control through a function.

*> gcov creates a flow graph containing all of the arcs. Some of these*

*> arcs represent branches, some of them represent places where code*

*> joins at a label. If we have a basic block that does not end with a*

*> branch, then it will have an arc marked fall_through that connects*

*> it with the following basic block. So an arc marked fall_through*

*> means that there is no branch instruction associated with the arc.*

*> These arcs are very easy to instrument. These arcs do not have*

*> branch probabilities associated with them.*

*> Function calls to setjmp may return more times than called. In*

*> order to account for this, I add a fake arc from the function*

*> entrance to the basic block immediately after the call. Any extra*

*> returns get counted as transitions on this fake arc.*

I modified gcov.c to walk through the graph data and count the number

of edges that were not "fake" and the number of vertices which were

connected to at least one non-"fake" edge, either coming into or

flowing out of the vertex. I then also counted the number of vertices

which were "terminal", meaning that all of the non-"fake" edges that

connected to the vertex either all flowed into the the vertex or

flowed out of the vertex.

The classic computation of cyclomatic complexity assumes that the

graph is strongly connected, and the computation then becomes:

#edges - #vertices + 1

The normal case is that there is one vertex (a) which has no

predecessor vertices, and one vertex (z) which has no successor

vertices, so the computation (taking into account the "virtual" edge

a->z not included in the number #edges below which makes the graph

strongly connected) becomes:

#edges - #vertices + 2

In the case that there are two vertices (a, b) which have no

predecessor vertices and one vertex (z) with no successor edges, the

cyclomatic complexity increases by one, for the one extra vertex (v)

and three additional edges (z->v, v->a, v->b) which need to be added

to make the graph strongly connected:

#edges - #vertices + 3

As you can see, the cyclomatic complexity computation

then becomes:

#edges - #vertices + #terminal_vertices

Gcc emits graphs which, after deletion of fake edges and unconnected

vertices, have no connection from their starting point to their end

points. Graphs can have multiple vertices with no successor edges,

typically due to a call to exit(). In addition, I saw one example in

a function which was compiled for a 68k machine where a vertex had no

predecessor edges.

I would like to suggest that a modification be made to gcov.c to make

this computation of cyclomatic complexity available. The structure

bb_info_list can keep track of function names via some simple addtions

in gcov, and the computation can be completed if the appropriate flag

is added to the command line. Unfortunately, at this time, because

of my contract with my employer, I am not able to release my changes

to gcc/gcov.c

Thank you for your help in this project.

Finally, I would like to post this letter with quotations from your

e-mail to me to the comp.compilers newsgroup. Please let me know if

it would be permissible to include these quotations.

*> Yes. That is OK.*

*> I'd also suggest sending it to egcs@cygnus.com. That is where most*

*> egcs discussion takes place, and is the best place for finding a*

*> volunteer to finish the work you started.*

skl.

Post a followup to this message

Return to the
comp.compilers page.

Search the
comp.compilers archives again.