Re: Regular expressions & finite automata. (Hank Dietz)
Fri, 24 Aug 90 17:18:18 -0500

          From comp.compilers

Related articles
Regular expressions & finite automata. (Mike Harrison) (1990-08-15)
Regular expressions & finite automata. (1990-08-23)
Re: Regular expressions & finite automata. (1990-08-24)
| List of all articles for this month |

Newsgroups: comp.compilers
From: (Hank Dietz)
Summary: Ambiguity on state merging
Keywords: lex, DFA
Organization: Purdue University Engineering Computer Network
References: <>
Date: Fri, 24 Aug 90 17:18:18 -0500

In article <> (Dale Worley) writes:
> [There are many equivalent DFAs that a regular expression can translate to,
> and vice versa. After all, any regular expression can be written in
> infinitely many equivalent ways, e.g. a(b|c) vs. ab|ac, or to be really
> pedantic, x vs. (x|x) vs. (x|x|x) ... -John]
>However, all DFAs for a given regular language can be derived from the
>minimal DFA (the one with the fewest states that accepts that
>language) by turning single states into sets of equivalent states.

First, even without state merging, the DFAs generated for (x|x|x) and x
are IDENTICAL. Still, state equivalence and merging are important
concepts. Which reminds me of a strange and maybe useful thing I did a
while back... I'm wondering if others think it useful also....

Basically, if someone gives you a DFA, it is possible to perform merging
of equivalent states to try to reduce it to the minimal form. Two states
A and B are equivalent if they have the same accept action and, for each
transition from A to C labeled L, there is a corrseponding transition from
B also labeled L and going to a state D| D is equivalent to C. That's the
"standard" definition and, for example, it will correctly reduce the DFA
for (x*|x) into the DFA for x*. Nothing tricky... I've implemented it
and it works fine.

Now, my little tweak was to recognize that, provided error detection is
not needed, states can be considered equivalent if they don't have
conflicting exit arcs and the accept action is either the same or is the
error accept action. I call this "unsafe" state merging. The result is
often a much simpler recognizer, but without error detection ability. For
example, the lex-like specification:

a*b { ACTION1 }
aaaac { ACTION2 }

would generate a DFA which actually corresponded to:

a*b { ACTION1 }
a*c { ACTION2 }

I figured that this might be useful because it effectively reduces the
patterns to the minimal trailing components which distinguish them. That
clearly results in fewer states for the DFA... although my experiments
show that it doesn't save all that many states for more complex patterns.

So, what's the verdict? Is this "unsafe" state merging idea useful? Has
it been done before? (If not, is it worth publishing?)

Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.