# Re: looking for a better way (help with algorithm design)

## Georg Bisseling <Georg.Bisseling@pallas.com>29 Sep 2001 10:59:51 -0400

From comp.compilers

Related articles
looking for a better way (help with algorithm design) mcr+usenet@yuck.net (Michael Roach) (2001-09-25)
Re: looking for a better way (help with algorithm design) mcr+usenet@yuck.net (Michael Roach) (2001-09-25)
Re: looking for a better way (help with algorithm design) Georg.Bisseling@pallas.com (Georg Bisseling) (2001-09-29)
| List of all articles for this month |

 From: Georg Bisseling Newsgroups: comp.compilers Date: 29 Sep 2001 10:59:51 -0400 Organization: Pallas GmbH References: 01-09-107 01-09-108 Keywords: parse Posted-Date: 29 Sep 2001 10:59:51 EDT

Dear Michael,

I will propose a very obvious solution that you might did rule out
since you thought that it needs too much space.

For every word in the set S store the word and any valid abbreviation
in the same tree.

Each node in the tree will have a marker to flag if it is a word in S.

Say S = {nap, nat}

root
/ | \ \
a(PT) n(PT) p(P) t(T)
/| |
(P)p t(T) a(PT)
/ \
T P

Uppercase marks a word in S. Letters in paren- theses show pointers to
words in S (which are identifyed by the two corresponding end nodes)

This solution would not take up more space than storing the set S
alone, provided that a large S gets more and more similar to all
allowed abbreviations. This assumption should be valid for natural
languages and DNA sequences....

Of course it is very difficult to formalize this assumption. Maybe it
is wrong?

It is obvious how to examine a given abbreviation. Follow the tree
until you reach the end of the abbreviation (if not found it is no
abbreviation). For each pointer (if no pointer it is no abbreviation)
trace back to root to get the matching word in S.

ciao
Georg

Post a followup to this message