Re: Are there any tools that can compare two source files

"Ira D. Baxter" <idbaxter@semdesigns.com>
22 Dec 2001 22:56:42 -0500

          From comp.compilers

Related articles
[2 earlier articles]
Re: Are there any tools that can compare two source files sarath_kumar@mentorg.com (Sarath Kumar) (2001-12-15)
Re: Are there any tools that can compare two source files nmm1@cus.cam.ac.uk (2001-12-19)
Re: Are there any tools that can compare two source files joachim_d@gmx.de (Joachim Durchholz) (2001-12-20)
Re: Are there any tools that can compare two source files torbenm@eir.diku.dk (2001-12-20)
Re: Are there any tools that can compare two source files Martin.Ward@durham.ac.uk (2001-12-20)
Re: Are there any tools that can compare two source files thp@cs.ucr.edu (2001-12-20)
Re: Are there any tools that can compare two source files idbaxter@semdesigns.com (Ira D. Baxter) (2001-12-22)
Re: Are there any tools that can compare two source files tej@melbpc.org.au (Tim Josling) (2001-12-22)
| List of all articles for this month |
From: "Ira D. Baxter" <idbaxter@semdesigns.com>
Newsgroups: comp.compilers
Date: 22 Dec 2001 22:56:42 -0500
Organization: Compilers Central
References: 01-12-027 01-12-058 01-12-076
Keywords: tools
Posted-Date: 22 Dec 2001 22:56:42 EST

"Nick Maclaren" <nmm1@cus.cam.ac.uk> wrote in message
> Sarath Kumar <sarath_kumar@mentorg.com> wrote:
> >
> > Generalizing the problem refers to the problem of finding the
> >semantic equivalence of two programs. Is this possible??


> It is not possible. Not even in exponential time. It is at least as
> hard as deciding whether they will halt (for obvious reasons).
>
> More seriously, I believe that the canonical rearrangement problem IS
> soluble, and I am surprised that there aren't more tools to do it.
> This is because it has two important uses:
>
> 1) Signatures for separately compiled codes, to determine
> whether rebuilding is necessary.
>
> 2) Checking for the inheritance of codes, especially in a
> copyright context.


We have a tool, the Clone "Doctor" that detects "similar" code by
doing rather the sort-of-obvious syntax tree comparisons.
http://www.semdesigns.com/Products/Clone/index.html.


Its purpose is to detect "reused" (copied [stolen?]) code. It seems
to be pretty good at that, even though it has only a tiny
"understanding" of program semantics (it understands "sequences").
We've thought a bit about adding certain types of semantic
equivalences (ignoring accidental statement order in the absence of
dependencies), but for practical purposes syntax-tree compares seem
pretty good.


Other researchers have used dynamic programming to detect similarity
with smallest number of edits.
--
Ira D. Baxter, Ph.D. CTO Semantic Designs, Inc.
http://www.semdesigns.com


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.