Related articles |
---|
Levinshtein Distances qjackson@wave.home.com (Quinn Tyler Jackson) (1999-11-16) |
Re: Levinshtein Distances derek@knosof.co.uk (1999-11-18) |
Re: Levinshtein Distances world!jhallen@uunet.uu.net (1999-11-18) |
Re: Levinshtein Distances lionel_delafosse@mail.dotcom.fr (Lionel Delafosse) (1999-11-19) |
Re: Levinshtein Distances Burak.Emir@dimail.epfl.ch (Burak Emir) (1999-12-20) |
From: | world!jhallen@uunet.uu.net (Joseph H Allen) |
Newsgroups: | comp.compilers,comp.editors |
Date: | 18 Nov 1999 02:49:15 -0500 |
Organization: | The World Public Access UNIX, Brookline, MA |
Distribution: | inet |
References: | 99-11-083 |
Keywords: | theory |
Quinn Tyler Jackson <qjackson@wave.home.com> wrote:
>Would anyone happen to know where I might be able to find C or C++ code that
>implements the Levinshtein Distance (minimal edit distance) metric?
Gnu-emacs uses a version of this dynamic programming algorithm for
dealing with dumb-terminal scrolling. It was invented for that
purpose by James Gosling- hmm... the same James Gosling of Java fame?
I'm not sure. Anyway, I have included some code.
Compile and run it. Enter two short strings (up to 100 chars) into
standard input. The first is the original, the second is what you
want after the edit:
abaa
aaba
It will print the following 2-d array:
a a b a
0 1 2 3 4
----------------------------------------
0 |0 1i 2i 3i 4i
a 1 |1d 0r 1r 2i 3r
b 2 |2d 1d 1r 1r 2i
a 3 |3d 2d 1r 2d 1r
a 4 |4d 3d 2d 2r 2d
The edit distance is the value of the lower right hand corner of the array
(2). The letter after each number indicates the action: r=replace/copy,
d=delete, i=insert. You can construct the edit by tracing from the lower
right corner to the upper left corner. For each r, move diagonally towards
the upper left. For each d, move up. For each i, move left. The path for
this pair of strings is 2d, 1r, 1r, 1r, 1i.
char cur[100]; int curlen;
char new[100]; int newlen;
struct
{
int d; /* Delete cost */
int i; /* Insert cost */
int r; /* Replace cost */
int pick; /* Which did we pick (0=delete, 1=insert, 2=replace) */
int tot; /* Resulting cost */
} ary[100*100];
char mtab[]=" dir";
char *tab=mtab+1;
main()
{
int y, x;
gets(cur); curlen=strlen(cur);
gets(new); newlen=strlen(new);
ary[0].d= -1;
ary[0].i= -1;
ary[0].r= -1;
ary[0].pick= -1;
ary[0].tot= 0;
for(x=1;x!=newlen+1;++x)
ary[x].d= -1,
ary[x].i= 1+ary[x-1].tot,
ary[x].r= -1,
ary[x].pick= 1,
ary[x].tot= 1+ary[x-1].tot;
for(y=1;y!=curlen+1;++y)
{
ary[y*(newlen+1)].d= 1+ary[(y-1)*(newlen+1)].tot,
ary[y*(newlen+1)].i= -1,
ary[y*(newlen+1)].r= -1,
ary[y*(newlen+1)].pick=0,
ary[y*(newlen+1)].tot= 1+ary[(y-1)*(newlen+1)].tot;
for(x=1;x!=newlen+1;++x)
{
int i, d, r;
d=ary[y*(newlen+1)+x].d=1+ary[(y-1)*(newlen+1)+x].tot;
i=ary[y*(newlen+1)+x].i=1+ary[y*(newlen+1)+x-1].tot;
if(cur[y-1]==new[x-1]) r=ary[y*(newlen+1)+x].r=ary[(y-1)*(newlen+1)+x-1].tot;
else r=ary[y*(newlen+1)+x].r=1+ary[(y-1)*(newlen+1)+x-1].tot;
if(i<r)
if(i<d) /* i */
ary[y*(newlen+1)+x].tot=i, ary[y*(newlen+1)+x].pick=1;
else /* d */
ary[y*(newlen+1)+x].tot=d, ary[y*(newlen+1)+x].pick=0;
else if(r<d) /* r */
ary[y*(newlen+1)+x].tot=r, ary[y*(newlen+1)+x].pick=2;
else /* d */
ary[y*(newlen+1)+x].tot=d, ary[y*(newlen+1)+x].pick=0;
}
}
printf("\t\t");
for(x=0;x!=(newlen+1);++x) printf("%c\t",new[x]);
printf("\n\t");
for(x=0;x!=newlen+1;++x) printf("%d\t",x);
printf("\n\t");
for(x=0;x!=newlen+1;++x) printf("--------");
printf("\n");
for(y=0;y!=curlen+1;++y)
{
if(y) printf("%c %d |",cur[y-1],y);
else printf(" %d |",y);
for(x=0;x!=newlen+1;++x)
printf("%d%c\t",ary[x+(newlen+1)*y].tot,tab[ary[x+(newlen+1)*y].pick]);
printf("\n");
}
}
--
/* jhallen@world.std.com (192.74.137.5) */ /* Joseph H. Allen */
Return to the
comp.compilers page.
Search the
comp.compilers archives again.