Trie algorithms?

Tim McDaniel <mcdaniel@portcullis.cpm.com>
29 Apr 1996 23:29:04 -0400

          From comp.compilers

Related articles
Trie algorithms? mcdaniel@portcullis.cpm.com (Tim McDaniel) (1996-04-29)
Re: Trie algorithms? grosch@cocolab.sub.com (1996-05-04)
Re: Trie algorithms? tmcd@crl.com (1996-05-06)
| List of all articles for this month |
From: Tim McDaniel <mcdaniel@portcullis.cpm.com>
Newsgroups: comp.compilers
Date: 29 Apr 1996 23:29:04 -0400
Organization: Compilers Central
Keywords: lex, question, comment

Since this is a kind of lexical matching problem, I thought I'd try
comp.compilers. Request 0: please tell me if there is a more
appropriate group for these questions.


I'm looking for algorithms for quick matching of one of a set of fixed
strings. "Trie"s look interesting, to judge by pp. 151-155 of the 2nd
edition "dragon book" (Aho, Sethi, and Ullman; they are exercises 3.26
thru 3.33). I could just do a simple DFA with backtracking, of
course, but a trie uses at most 2N state transitions in processing an
input of length N (3.32.c), and I've got a *really* *big* *N* in this
application.


The problem is that the algorithms in AS&U just reply "yes" or "no" to
whether at least one keyword exists somewhere in the string.


What I'd like to do, though, is do something on each acceptance state.
It's token replacement, where an N-character keyword must be replaced
by an N-character replacement. Single unmatched characters are left
untranslated (which is what those trie algorithms assume anyway).
Furthermore, I want left-most greedy matching. For keywords


        he -> qi
        hers -> nzzy
        erg -> btu


and input "herg", I want to match "he" (and find its replacement
value), "r", and "g", and produce "qirg". I don't want to match "h"
and then "ergs", which I think the trie algorithms there would do.


I could try to adapt the dragon book algorithms, but perhaps someone
out there has already developed code for this. Does anyone have any
pointers to code, or indications of how I can adapt the dragon book
algorithms?


(And can you tell me whether I should be capitalizing "dragon book"?)


--
mcdaniel@cpm.com is Tim McDaniel at work: Customer Potential Management Corp.,
        East Peoria, IL, USA. The phone number is +1 309 698-1037.
For personal mail please use "tmcd@crl.com".
[I'd just use flex, following some of the hints in the documentation to make
lexers work faster. -John]
--


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.