# Re: Levinshtein Distances

## world!jhallen@uunet.uu.net (Joseph H Allen)18 Nov 1999 02:49:15 -0500

From comp.compilers

Related articles
Levinshtein Distances qjackson@wave.home.com (Quinn Tyler Jackson) (1999-11-16)
Re: Levinshtein Distances derek@knosof.co.uk (1999-11-18)
Re: Levinshtein Distances world!jhallen@uunet.uu.net (1999-11-18)
Re: Levinshtein Distances lionel_delafosse@mail.dotcom.fr (Lionel Delafosse) (1999-11-19)
Re: Levinshtein Distances Burak.Emir@dimail.epfl.ch (Burak Emir) (1999-12-20)
| List of all articles for this month |

 From: world!jhallen@uunet.uu.net (Joseph H Allen) Newsgroups: comp.compilers,comp.editors Date: 18 Nov 1999 02:49:15 -0500 Organization: The World Public Access UNIX, Brookline, MA Distribution: inet References: 99-11-083 Keywords: theory

Quinn Tyler Jackson <qjackson@wave.home.com> wrote:
>Would anyone happen to know where I might be able to find C or C++ code that
>implements the Levinshtein Distance (minimal edit distance) metric?

Gnu-emacs uses a version of this dynamic programming algorithm for
dealing with dumb-terminal scrolling. It was invented for that
purpose by James Gosling- hmm... the same James Gosling of Java fame?
I'm not sure. Anyway, I have included some code.

Compile and run it. Enter two short strings (up to 100 chars) into
standard input. The first is the original, the second is what you
want after the edit:

abaa
aaba

It will print the following 2-d array:

a a b a
0 1 2 3 4
----------------------------------------
0 |0 1i 2i 3i 4i
a 1 |1d 0r 1r 2i 3r
b 2 |2d 1d 1r 1r 2i
a 3 |3d 2d 1r 2d 1r
a 4 |4d 3d 2d 2r 2d

The edit distance is the value of the lower right hand corner of the array
(2). The letter after each number indicates the action: r=replace/copy,
d=delete, i=insert. You can construct the edit by tracing from the lower
right corner to the upper left corner. For each r, move diagonally towards
the upper left. For each d, move up. For each i, move left. The path for
this pair of strings is 2d, 1r, 1r, 1r, 1i.

char cur[100]; int curlen;
char new[100]; int newlen;
struct
{
int d; /* Delete cost */
int i; /* Insert cost */
int r; /* Replace cost */
int pick; /* Which did we pick (0=delete, 1=insert, 2=replace) */
int tot; /* Resulting cost */
} ary[100*100];

char mtab[]=" dir";
char *tab=mtab+1;

main()
{
int y, x;
gets(cur); curlen=strlen(cur);
gets(new); newlen=strlen(new);

ary[0].d= -1;
ary[0].i= -1;
ary[0].r= -1;
ary[0].pick= -1;
ary[0].tot= 0;

for(x=1;x!=newlen+1;++x)
ary[x].d= -1,
ary[x].i= 1+ary[x-1].tot,
ary[x].r= -1,
ary[x].pick= 1,
ary[x].tot= 1+ary[x-1].tot;

for(y=1;y!=curlen+1;++y)
{
ary[y*(newlen+1)].d= 1+ary[(y-1)*(newlen+1)].tot,
ary[y*(newlen+1)].i= -1,
ary[y*(newlen+1)].r= -1,
ary[y*(newlen+1)].pick=0,
ary[y*(newlen+1)].tot= 1+ary[(y-1)*(newlen+1)].tot;

for(x=1;x!=newlen+1;++x)
{
int i, d, r;
d=ary[y*(newlen+1)+x].d=1+ary[(y-1)*(newlen+1)+x].tot;
i=ary[y*(newlen+1)+x].i=1+ary[y*(newlen+1)+x-1].tot;
if(cur[y-1]==new[x-1]) r=ary[y*(newlen+1)+x].r=ary[(y-1)*(newlen+1)+x-1].tot;
else r=ary[y*(newlen+1)+x].r=1+ary[(y-1)*(newlen+1)+x-1].tot;
if(i<r)
if(i<d) /* i */
ary[y*(newlen+1)+x].tot=i, ary[y*(newlen+1)+x].pick=1;
else /* d */
ary[y*(newlen+1)+x].tot=d, ary[y*(newlen+1)+x].pick=0;
else if(r<d) /* r */
ary[y*(newlen+1)+x].tot=r, ary[y*(newlen+1)+x].pick=2;
else /* d */
ary[y*(newlen+1)+x].tot=d, ary[y*(newlen+1)+x].pick=0;
}
}

printf("\t\t");
for(x=0;x!=(newlen+1);++x) printf("%c\t",new[x]);
printf("\n\t");
for(x=0;x!=newlen+1;++x) printf("%d\t",x);
printf("\n\t");
for(x=0;x!=newlen+1;++x) printf("--------");
printf("\n");

for(y=0;y!=curlen+1;++y)
{
if(y) printf("%c %d |",cur[y-1],y);
else printf(" %d |",y);
for(x=0;x!=newlen+1;++x)
printf("%d%c\t",ary[x+(newlen+1)*y].tot,tab[ary[x+(newlen+1)*y].pick]);
printf("\n");
}
}
--
/* jhallen@world.std.com (192.74.137.5) */ /* Joseph H. Allen */

Post a followup to this message