More Taxonomies and Toolkits of Algorithms (my PhD dissert.)

watson@win.tue.nl (Bruce W. Watson)
Fri, 18 Aug 1995 15:57:24 GMT

          From comp.compilers

Related articles
More Taxonomies and Toolkits of Algorithms (my PhD dissert.) watson@win.tue.nl (1995-08-18)
| List of all articles for this month |

Newsgroups: comp.compilers
From: watson@win.tue.nl (Bruce W. Watson)
Keywords: report, tools, available
Organization: Eindhoven University of Technology, the Netherlands
Date: Fri, 18 Aug 1995 15:57:24 GMT

Hi All,
        I would like to announce the availability of my PhD dissertation. The full
citation is:


        Watson, Bruce W. ``Taxonomies and Toolkits of Regular Language
        Algorithms'', Faculty of Computing Science, Eindhoven University of
        Technology, 1995.


The research reported in it should appeal to anyone who read and used (and
hopefully enjoyed) my previous taxonomies or C++ toolkits. The summary appears
later in this posting. While the software will be available for anonymous ftp
from Eindhoven, only the paper version of the dissertation is (for now)
available.


If you would like a copy of it, please send me your full mailing address. If
you are interested in puting a copy in your company/university/institute
library, just ask for an extra copy. I would also welcome any suggestions of
other people who may be interested in a copy.




Bruce. (watson@win.tue.nl)




Summary:
-------


A number of fundamental computing science problems have been studied since the
1950s and 1960s. For each of these problems, numerous solutions (in the form of
algorithms) have been developed over the years. In these collections of
solutions, we can identify the following three deficiencies:


1. Algorithms solving the same problem are difficult to compare to one
        another. This could be due to the use of different programming languages,
        paradigms, styles of presentation --- or simply the addition of unnecessary
        details.


2. Collections of algorithm implementations solving a problem are difficult to
        find. Some of the algorithms are presented in a relatively obsolete manner,
        either using a now-defunct notation or obsolete programming language,
        making it difficult to either implement the algorithm or find an existing
        implementation.


3. Little is known about the comparative practical running time performance
        of the algorithms. The lack of existing implementations in one and the same
        framework, especially of some of the older algorithms, has made it
        difficult to determine the running time characteristics of the algorithms.
        Selection of an algorithm must then be made on the basis of the theoretical
        running time, or simply by guessing.


In this dissertation, a solution to each of these deficiencies is presented. To
present the solutions, we use the following three fundamental computing science
problems:


1. Keyword pattern matching in strings. Given a finite non-empty set of
        keywords (the patterns) and an input string, find the set of all
        occurrences of a keyword as a substring of the input string.


2. Finite automata (FA) construction. Given a regular expression, construct a
        finite automaton which accepts the language denoted by the regular
        expression.


3. Deterministic finite automata (DFA) minimization. Given a DFA, construct
        the unique minimal DFA accepting the same language.


In the following paragraphs, we will outline the solutions presented for each
of the deficiencies.


The difficulty in comparing algorithms is overcome by creating a {\em taxonomy}
of algorithms for a given problem. Each of the algorithms is rewritten in a
common notation and is examined to determine the essential ingredients that
distinguish it from any other algorithm. These ingredients (known as {\em
details}) can take the form of problem details (a restriction on the class
of problems solved), or algorithm details (some correctness-preserving change
to the algorithm to improve efficiency). Once each algorithm has been reduced
in such a way, it can be characterized by its set of details. In presenting the
taxonomy, the common details of several algorithms can be factored and
presented together. In this fashion, a `family tree' of the algorithms is
constructed, showing clearly what any two algorithms have in common and where
they differ. Because the root of the family tree is a na\"{\i}ve, and correct,
algorithm and the details are applied in a correctness-preserving manner, the
correctness argument for each of the algorithms is implicit in the taxonomy.


The common notation and presentations in the taxonomies enable us to implement
the algorithms uniformly, in the form of a {\em class library} (also known as a
{\em toolkit}). The factoring of essential details, inherent in the taxonomies,
leads to factoring of common components in the inheritance hierarchy of the
class library. Object-oriented concepts, such as virtual (or deferred) member
functions, inheritance and template classes, prove to be useful in presenting a
coherent class library, the structure of which reflects the taxonomy from which
it was created. For the first time, most (if not all) solutions are presented
in a single class library, giving clients of the library a large choice of
objects and functions.


With a class library that contains most of the known solutions, we are finally
able to gather data on the performance of the algorithms in practice. Since
the algorithms are taken from a single class library which was implemented by
one person, and the quality of implementation of the class library is
homogeneous, the relative performance data gathered (comparing algorithms) is
not biased by the implementation. This performance data allows software
engineers to make more informed decisions (based upon their needs, and the
characteristics of their input data) concerning which algorithm and therefore
which objects and functions to use in their applications.


The development of the taxonomies has not been without its spin-offs. In each
of the three taxonomies presented, significant new algorithms have been
developed. These algorithms are also implemented in the corresponding class
libraries. The techniques developed in the taxonomy of pattern matching
algorithms proved to be particularly useful in deriving an algorithm for
regular expression pattern matching, and in doing so, answering an open
question posed by A.V. Aho in \cite[p.~342]{ah:pattern-matching-open}.
--
_____________________________________________________________________________
Bruce Watson
watson@win.tue.nl
--


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.