Re: comparing two programs?

khorsell@ee.latrobe.edu.au (Kym Horsell)
Tue, 26 Apr 1994 06:19:34 GMT

          From comp.compilers

Related articles
comparing two programs? abykov@pollux.usc.edu (1994-04-18)
Re: comparing two programs? bosullvn@maths.tcd.ie (Bryan O'Sullivan) (1994-04-19)
Re: comparing two programs? jan@neuroinformatik.ruhr-uni-bochum.de (1994-04-20)
Re: comparing two programs? abykov@pollux.usc.edu (1994-04-20)
Re: comparing two programs? khorsell@ee.latrobe.edu.au (1994-04-26)
Re: comparing two programs? masjpb@midge.bath.ac.uk (1994-04-29)
| List of all articles for this month |

Newsgroups: comp.compilers
From: khorsell@ee.latrobe.edu.au (Kym Horsell)
Keywords: tools
Organization: Department of Electronic Engineering, La Trobe University
References: 94-04-124
Date: Tue, 26 Apr 1994 06:19:34 GMT

abykov@pollux.usc.edu (Alexander Bykov) writes:
>I need the program that would compare two programs for being identical.


This problem is about as old as grading computer assignments. ;-)


But directly comparing programs pairwise is *not* what you really want to
do. Obviously such a procedure will get rather expensive when comparing
all pairs of program handed in in the average undergrad programming class
(i.e. of the order of 100 programs per week).


You sould therefore try to extract a small number of distinctive
"features" from the relevant programs and compare these statistically. Any
program pair/group that give an appropriately large likelyhood of being
identical can be examined manually; and such a procedure is "closer" ;-)
to O(n).


I haven't any references on hand. But the suggestions I have read include
measuring features such as
- number of non-dead variables of simple types
- number of non-dead statements of simple types
- average number of operators per statement


These measures can usually be extracted from compilers and, I think,
portable PD compilers like gcc that produce a relatively straightforward
intermediate code (via the -r option) may be a starting point.


I would presume such has already been done (& available via ftp) a number
of times.


-kym


BTW, a typical anecdote.


I was once in the unenvious position of grading grad/undergrad C
programs. I remember a couple of times finding more than one
program within the same group with a very distinctive construct;
this made finding cheats a little easier. In one case the
assignment involved, among other things, testing characters
for upper- or lower-case membership (in order to handle a user
interface that allowed "a-z" and "A-Z" notations). The programs
in question both had a statement containing:


(c='a')|(c='b')|...(c='z')|(c='A')|...(c='Z')


and the individuals concerned *denied* any knowlege of how this came
to be. The problem (for them) was that I had been in the terminal room
when I heard them exchanging the source code.


I'm sure there's (at least) a couple of points here concerning
the design of automated plagarism-detecting. ;-)
--


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.