Sources available for interesting high-speed finite-state machines

behoffski@grouse.com.au (behoffski)
20 Nov 1997 22:40:51 -0500

          From comp.compilers

Related articles
Sources available for interesting high-speed finite-state machines behoffski@grouse.com.au (1997-11-20)
| List of all articles for this month |
From: behoffski@grouse.com.au (behoffski)
Newsgroups: comp.compilers
Date: 20 Nov 1997 22:40:51 -0500
Organization: Compilers Central
Keywords: FSM

G'day,


I've published a new finite-state machine technique in the November
issue of Dr.Dobb's Journal that you might be interested in. The
machine architecture combines table-driven finite-state machines with
threaded code to provide extremely high performance. This
architecture can be applied in many, many problem areas, and, where it
can be reasonably applied, can provide double or triple the
performance of contemporary code.


For example, the article presents three applications of the
architecture:
- a word/line count utility, with performance roughly
double that of the typical "C" implmentation,


- a basic regular expression search utility, called
Grouse Grep. This utility does a better job of
handling simple string searches, case-insensitive
matching and class matching than any other
utility I've seen. Some iterative searches,
such as x.*y.*z, run in nearly linear time due
to inter-table optimisations. While the utility
provides state-of-the-art performance in some
areas for nondeterministic FSAs, it cannot
always outperform deterministic FSAs. Sadly,
the demonstration version included in the
article cannot handle more than basic RE
elements (yet!)


- A very fast static Huffman decoder, which
can decompress a Huffman-decoded bitstream
for just 14 cycles per output byte on a
Pentium. This time does not include the time
required to set up the decoding tables (this
time is quite modest, typically of the order of
300k cycles). On a Pentium 150, that's 11
million output bytes a second (of course, this
ignores the effort needed to read in the bitstream
and to deal with the decompressed output).


The article presents code based on Intel 80x86 assembly. Even worse,
the applications are written to run in real mode under MS-DOS. Don't
let these limitations fool you -- the code is written almost entirely
in ANSI C and is intended to be easily portable to Unix, VMS etc. The
assembly technique works very nicely on the Intel chip, but is
completely general and can be ported (with varying degrees of
efficiency, sigh) to other architectures.


I've published complete source code along with the articles. The code
can be run entirely as a C program, although the performance is far
less than using the assembly versions of the FSA engine.


You can find the sources both on Dr. Dobbs' web site (file
grouse.zip):
http://ftp.ddj.com/ftp/1997/1997.11/


and on my company's web site, via ftp
(files ggrep101.zip, ghuff100.zip and wc102.zip):
http://www.grouse.com.au/
ftp://ftp.grouse.com.au/clients/grouse/


There are many, many other areas where this technique could be
applied. For example, it may provide a faster lexical analyser, and
might facilitate very fast table-driven compilers. Virus scanners
might be able to do a better job of scanning images. DNA sequence
searches might be able to search 4 bases packed in a byte for a cost
of around 20 cycles per byte.


Compiler writers will probably be able integrate my ideas into their
treatment of multi-way branch cases (e.g. switch ()...), leading to a
substantial improvement in performance for a modest increase in code
size.


The key to the architecture is that you can define your own state
tables, and define 20 or 30 actions that name how you react to
incoming bytes in each state. This allows extremely sophisticated
classification, merely by table lookup and code threading, of the
incoming bytes.


Best of all, the code is not merely freeware: I've placed any ideas of
the architecture that I've invented into the public domain. In
return, I would appreciate return donations of improved code and/or
new applications where you extend my code or find applications in new
areas. And if you really like my ideas and would like to see more
public-domain software from me (there's lots more that I could do!),
consider sending a donation to my company, Grouse Software.


cheers,


behoffski
--
Brenton Hoff (behoffski) | Software Engineer, Grouse Software Pty Ltd
behoffski@grouse.com.au | http://www.grouse.com.au/
--


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.