Putting probes in source code (was: Parser)

"Ira Baxter" <idbaxter@semdesigns.com>
12 Sep 2002 14:29:24 -0400

          From comp.compilers

Related articles
Parser aaron.becher@eds.com (Aaron Becher) (2002-09-12)
Putting probes in source code (was: Parser) idbaxter@semdesigns.com (Ira Baxter) (2002-09-12)
| List of all articles for this month |

From: "Ira Baxter" <idbaxter@semdesigns.com>
Newsgroups: comp.compilers
Date: 12 Sep 2002 14:29:24 -0400
Organization: Compilers Central
References: 02-09-078
Keywords: tools, parse
Posted-Date: 12 Sep 2002 14:29:24 EDT

"Aaron Becher" <aaron.becher@eds.com> wrote in message
> I have a VERY large pool of source
> code that needs a certain line of code inserted at the beginning of
> every function (a macro). .... I was thinking a good idea might be to take
> an open source compiler, which knows how to truly parse the code, and
> try to start there.


Not open source, but has everything you need to do this. See
http://www.semdesigns.com/Products/DMS/DMSToolkit.html. It is
generalized compiler technology, providing parsers, prettyprinters,
tree builders, source-to-source tree transformers, attribute
evaluators, symbol tables, lots more. It can be obtained with C and
C++ front ends.


Installing probes is pretty easy with DMS.
A rewrite rule of the form of
            rule install_probe(name:IDENTIFIER, body: function_body, ...)-- names
syntax entity parameters
                      : function_decl -> function_decl -- declares rule to map this
syntax class to same syntax class
                  = " \modifiers \result \name(\params} { \declarations \code }
    -- rule lhs pattern
                        -> " \modifiers \result \name(\params} { \declarations
                                                            log(\name); \code } ". -- rule rhs pattern
  will insert
                            log(functionname)
  at the front of every function. You can, of course, put what construct
you want (i.e., your macro call) in the place of the log call.
You may need several rules to cover other variants, such as functions
not returning voidvalues, etc.


You can see a production example of this explained at our website,
where a white paper on how we use DMS to implement code coverage
tools with probes is available:
http://www.semdesigns.com/Products/TestCoverage/index.html
It takes about 50 rules like this to implement complete coverage for a
language.




Also, I thought I should respond to the moderator's remark.


> [Using a regular parser in a rewrite tool is usually an exercise in
> frustration, because the parser throws away info that you'd want to
> keep in the rewritten source code. C is particularly unpleasant in
> this regard because compilers generally start by expanding the
> preprocessor stuff, then compile the expanded stuff, by which time
> it's pretty much impossible to reconstruct the source. My
> experience is that regular expression pattern matching combined with
> ad-hoc heuristics give about as good result as any. -John]


Yes, like comments, radix of literal values, leading zero counts on
hex constants, whether your floating point number was encoded as fixed
point or with "E", etc., which line and column each token was found
on, as well as expanding includes/macros, and choosing only specific
arms of conditionals.


If you use a standard lexer/parser pair.


For software reengineering, you want to use a lexer/parser pair that
does NOT do this, and you basically can't find these off the shelf in
the same way you can find LEX/YACC clones. Then you also need
machinery to regenerate this information from the modified
representation of the source, in a way that minimally damages it while
still honoring the actual changes that got made.


Our software reengineering tools in fact DO capture all this
information. For C and C++ we have a custom preprocessor that does
NOT expand macros and conditionals.


(I'll be the first to admit that it trips over a number
of special cases that look to us like pretty abusive
unstructured uses, e.g.,
          if (...) {
                ....
        #if ....
              } else {
      #endif
              ...
            };


Having said that, we can handle these in general, but can handle the
instance cases just fine. Yes, this preprocessor was a lot of work).


This allows one to reproduce lexical information and preprocessor
conditionals unchanged in modified output, and even to apply program
transformations and type inference across the conditionals.


There's an interesting paper at SCAM 2002 about reasoning about type
declarations in the face of preprocessor conditionals using these
foundations. See http://www.brunel.ac.uk/%7Ecsstmmh2/scam2002/ We've
done a number of other interesting C/C++ analyses and manipulations,
including clone detection, test coverage probe insertion, removal of
useless preprocessor directives, etc.


But, the only way out of the exercise in frustation is to switch to
reengineering infrastructure, not conventional parsing infrastructure.
Unfortunately, these aren't covered in textbooks.


--
Ira Baxter, Ph.D. CTO Semantic Designs
www.semdesigns.com 512-250-1018


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.