New sublinear string pattern matching algorithm taxonomy

wsinswan@win.tue.nl (Gerard Zwaan)
Sat, 24 Jun 1995 12:26:59 GMT

          From comp.compilers

Related articles
New sublinear string pattern matching algorithm taxonomy wsinswan@win.tue.nl (1995-06-24)
| List of all articles for this month |

Newsgroups: comp.compilers
From: wsinswan@win.tue.nl (Gerard Zwaan)
Keywords: report, theory, available, FTP, bibliography
Organization: Eindhoven Univ. of Technology, The Netherlands
Date: Sat, 24 Jun 1995 12:26:59 GMT
Status: RO

We are pleased to announce the availability of our report


"A taxonomy of sublinear multiple keyword pattern algorithms"


by B.W. Watson and G. Zwaan. The abstract of the report is given
below. The report is available at the following URLs:


      ftp://ftp.win.tue.nl/pub/techreports/zwaan/sublin.300dpi.ps.gz
                                                                                    (or CSR95-13.300dpi.ps.gz)
      ftp://ftp.win.tue.nl/pub/techreports/zwaan/sublin.600dpi.ps.gz
                                                                                    (or CSR95-13.600dpi.ps.gz)


As the names indicate, these files are gzipped Postscript files
at resolutions 300dpi and 600dpi, respectively. The files
sublin.README and CSR95-13.README in the same directory
contain the contents of this message.
If you can not print one of the provided files, you can request
a file in another format by sending an email message to
wsinswan@win.tue.nl. Paper copies of the report can also
be requested by either sending an email message to wsinswan@win.tue.nl
or sending mail to


              Eindhoven University of Technology
              Department of Computing Science
              Computing Science Reports
              P.O. Box 513
              5600 MB Eindhoven
              The Netherlands


containing the report number (CSR95-13) and the number of copies desired.


Other questions and/or remarks concerning the report can always be
directed to either wsinswan@win.tue.nl or watson@win.tue.nl.


G. Zwaan and B.W. Watson
Faculty of Mathematics and Computing Science
Eindhoven University of Technology


=========================================================================


                                    A taxonomy of
                          sublinear multiple keyword
                        pattern matching algorithms


                            B.W. Watson & G. Zwaan
        Faculty of Mathematics and Computing Science
                Eindhoven University of Technology
                          P.O. Box 513, 5600 MB
                    Eindhoven, The Netherlands
  email: watson@win.tue.nl or wsinswan@win.tue.nl


                                          Abstract


This paper presents a taxonomy of sublinear keyword pattern matching
algorithms related to the Boyer-Moore algorithm~\cite{BoyerMoore}
and the Commentz-Walter algorithm~\cite{Commentz-Walter,Commentz-Walter2}.
The taxonomy includes, amongst others, the multiple keyword
generalization of the single keyword Boyer-Moore algorithm and
an algorithm by Fan and Su~\cite{FanSu,FanSu:inbook}.
The corresponding precomputation algorithms are presented as well.
The taxonomy is based on the idea of ordering algorithms according
to their essential problem and algorithm details, and deriving
all algorithms from a common starting point by successively adding
these details in a correctness preserving way. This way of presentation
not only provides a complete correctness argument of each algorithm,
but also makes very clear what algorithms have in common
(the details of their nearest common ancestor) and where they differ
(the details added after their nearest common ancestor).
Introduction of the notion of safe shift distances proves to be
essential in the derivation and classification of the algorithms.
Moreover, the paper provides a common derivation for and
a uniform presentation of the precomputation algorithms,
not yet found in the literature.


@ARTICLE{BoyerMoore,
                AUTHOR = {Robert S. Boyer and J. Strother Moore},
                JOURNAL = {Communications of the ACM},
                MONTH = oct,
                NUMBER = {10},
                PAGES = {762--772},
                TITLE = {A Fast String Searching Algorithm},
                VOLUME = {20},
                YEAR = {1977},
                CRINDEX = {3.74, 4.40, 5.25},
}


@INPROCEEDINGS{Commentz-Walter,
                AUTHOR = {Beate Commentz-Walter},
                BOOKTITLE = {Proceedings 6th International Colloquium on
                                                            Automata, Languages and Programming},
                EDITOR = {H.A. Maurer},
                MONTH = jul,
                PAGES = {118--132},
                PUBLISHER = {Springer},
                SERIES = {Lecture Notes in Computer Science},
                TITLE = {A String Matching Algorithm Fast on the Average},
                VOLUME = {71},
                YEAR = {1979},
}


@TECHREPORT{Commentz-Walter2,
                AUTHOR = {Beate Commentz-Walter},
                INSTITUTION = {IBM-Germany, Scientific Center Heidelberg},
                MONTH = sep,
                NUMBER = {TR 79.09.007},
                TITLE = {A String Matching Algorithm Fast on the Average},
                YEAR = {1979},
}


@ARTICLE{FanSu,
                AUTHOR = {Jang-Jong Fan and Key-Yih Su},
                JOURNAL = {IEEE Transactions on Knowledge and
                                                            Data Engineering},
                MONTH = apr,
                NUMBER = {2},
                PAGES = {339--351},
                TITLE = {An Efficient Algorithm for Matching
                                                            Multiple Patterns},
                VOLUME = {5},
                YEAR = {1993}
}


@INCOLLECTION{FanSu:inbook,
                AUTHOR = {Jang-Jong Fan and Key-Yih Su},
                BOOKTITLE = {Computer Algorithms: String Pattern
                                                            Matching Strategies},
                EDITOR = {Jun{-}ichi Aoe},
                PUBLISHER = {IEEE Computer Society Press},
                TITLE = {An Efficient Algorithm for Matching
                                                            Multiple Patterns},
                YEAR = {1994},
                ISSN_ISBN = {ISBN 0-8186-5462-7}
}


--


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.