Fri, 24 Aug 90 17:18:18 -0500

Related articles |
---|

Regular expressions & finite automata. mph@inmos.com (Mike Harrison) (1990-08-15) |

Regular expressions & finite automata. worley@compass.com (1990-08-23) |

Re: Regular expressions & finite automata. hankd@dynamo.ecn.purdue.edu (1990-08-24) |

Newsgroups: | comp.compilers |

From: | hankd@dynamo.ecn.purdue.edu (Hank Dietz) |

Summary: | Ambiguity on state merging |

Keywords: | lex, DFA |

Organization: | Purdue University Engineering Computer Network |

References: | <9008231529.AA26343@sn1987a.compass.com> |

Date: | Fri, 24 Aug 90 17:18:18 -0500 |

In article <9008231529.AA26343@sn1987a.compass.com> worley@compass.com (Dale Worley) writes:

*>*

*> [There are many equivalent DFAs that a regular expression can translate to,*

*> and vice versa. After all, any regular expression can be written in*

*> infinitely many equivalent ways, e.g. a(b|c) vs. ab|ac, or to be really*

*> pedantic, x vs. (x|x) vs. (x|x|x) ... -John]*

*>*

*>However, all DFAs for a given regular language can be derived from the*

*>minimal DFA (the one with the fewest states that accepts that*

*>language) by turning single states into sets of equivalent states.*

First, even without state merging, the DFAs generated for (x|x|x) and x

are IDENTICAL. Still, state equivalence and merging are important

concepts. Which reminds me of a strange and maybe useful thing I did a

while back... I'm wondering if others think it useful also....

Basically, if someone gives you a DFA, it is possible to perform merging

of equivalent states to try to reduce it to the minimal form. Two states

A and B are equivalent if they have the same accept action and, for each

transition from A to C labeled L, there is a corrseponding transition from

B also labeled L and going to a state D| D is equivalent to C. That's the

"standard" definition and, for example, it will correctly reduce the DFA

for (x*|x) into the DFA for x*. Nothing tricky... I've implemented it

and it works fine.

Now, my little tweak was to recognize that, provided error detection is

not needed, states can be considered equivalent if they don't have

conflicting exit arcs and the accept action is either the same or is the

error accept action. I call this "unsafe" state merging. The result is

often a much simpler recognizer, but without error detection ability. For

example, the lex-like specification:

a*b { ACTION1 }

aaaac { ACTION2 }

would generate a DFA which actually corresponded to:

a*b { ACTION1 }

a*c { ACTION2 }

I figured that this might be useful because it effectively reduces the

patterns to the minimal trailing components which distinguish them. That

clearly results in fewer states for the DFA... although my experiments

show that it doesn't save all that many states for more complex patterns.

So, what's the verdict? Is this "unsafe" state merging idea useful? Has

it been done before? (If not, is it worth publishing?)

-hankd@ecn.purdue.edu

--

Post a followup to this message

Return to the
comp.compilers page.

Search the
comp.compilers archives again.