Re: working with very large Finite State Machines

Ralph Boland <rboland@unb.ca>
26 Apr 2001 21:15:17 -0400

          From comp.compilers

Related articles
working with very large Finite State Machines sjbradtke@home.com (Steve Bradtke) (2001-04-22)
Re: working with very large Finite State Machines vannoord@let.rug.nl (2001-04-26)
Re: working with very large Finite State Machines chase@world.std.com (David Chase) (2001-04-26)
Re: working with very large Finite State Machines rboland@unb.ca (Ralph Boland) (2001-04-26)
Re: working with very large Finite State Machines rbates@southwind.net (Rodney M. Bates) (2001-04-29)
| List of all articles for this month |

From: Ralph Boland <rboland@unb.ca>
Newsgroups: comp.ai.nat-lang,comp.compilers
Date: 26 Apr 2001 21:15:17 -0400
Organization: University of New Brunswick
References: 01-04-117
Keywords: lex
Posted-Date: 26 Apr 2001 21:15:17 EDT

Steve Bradtke wrote:
> I am working on a project that involves the construction of
> very large Finite State Machines (up to approximately 10^7 states).


Do you have a corresponding regular expression (RE) for these
machines? If so you could then convert your regular expression into a
grammar and run a parser on it. (I strongly recommend a parser that
properly supports regular right part grammars since this can reduce
many problems with ambiguities). Now you are using a stack to
recognize your FSM/RE/Grammar and as a result the parse table can be
much smaller than the FSM.


This idea may get you to a much smaller machine but it has problems.


1) parsing is slower.


2) Converting your RE into a grammar introduces problems with among
other things ambiguities. Note that if the grammar is based upon a RE
then whether or not the grammar is ambiguous can be determined in
polynomial time. However even if the grammar is not ambigouus the
parser tool may not be able to disambiguate (?) the grammar because
of inadequate lookahead. (for some unambigouos REs arbitrarily large
lookahead is required)


In part because your FSM is so large the possibility of failure here
is large. But so are the potential savings in table size.


It would seem like an interesting project to build a tool that does
all of this automatically. Could be Masters or Ph.D. scale. Note
that the algorithm for testing if a RE is ambiguous (some accepted
strings can be accepted through more than one path through the RE) is
n*n*n*n i.e. n^4 where n is the size of the RE.


Hope this helps. Hope everything I said is correct.


One last note. I would think the best way to do this would be to
construct a minimized version of your finite state machine first as
this should help reduce problems with ambiguities (I think). Of
course this is exactly what you were trying to do in the first place.


Ralph Boland


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.