Related articles |
---|
approximate string matching okrslar@informatik.uni-muenchen.de (Martin Okrslar) (2000-01-15) |
Re: approximate string matching jrs@JustMakesSense.com (2000-01-15) |
Re: approximate string matching rweaver@ix.netcom.com (2000-01-19) |
Re: approximate string matching lionel_delafosse@mail.dotcom.fr (Lionel Delafosse) (2000-01-19) |
Re: approximate string matching maratb@CS.Berkeley.EDU (Marat Boshernitsan) (2000-01-19) |
Re: approximate string matching torbenm@diku.dk (2000-01-19) |
From: | torbenm@diku.dk (Torben AEgidius Mogensen) |
Newsgroups: | comp.compilers |
Date: | 19 Jan 2000 01:15:39 -0500 |
Organization: | Department of Computer Science, U of Copenhagen |
References: | 00-01-044 |
Keywords: | theory |
Martin Okrslar <okrslar@informatik.uni-muenchen.de> writes:
>I would like to 'cluster' some files regarding their syntactic
>similarity. (I am correcting the homework of some students, and since
>I showed them with 'diff', that they did a simple cp they started to
>change their files minimally, so that one does not see on the first
>diff-glance, that the file is a copy. Note: We are not allowed to
>decrease the score of anybody, just because we 'suspect' copying. I
>just want to show the students, that we are not completely dumb.)
I read (in http://www.theregister.co.uk) some months ago that a
professor at Glasgow University wrote a program to find similarities
in student reports and found some cases of cheating this way. He is
apparently intending to sell this program to other universities.
A simple test that handles things like interchanging sections,
systematic replace of one word by another is to gzip both files and
check if they are of nearly equal size afterwards. This, obviously,
can give a lot of false positives but might work as an initial coarse
sorting.
This procedure is probably best for program texts, where most
systematic wrangling won't change the gzipped size very much.
Torben Mogensen (torbenm@diku.dk)
Return to the
comp.compilers page.
Search the
comp.compilers archives again.