Re: Non-deter. regular expr. -> deter. regular exp

mark@omnifest.uwm.edu (Mark Hopkins)27 Mar 1996 00:07:21 -0500

From comp.compilers

Related articles
Non-deter. regular expr. -> deter. regular expr. faase@cs.utwente.nl (1996-03-20)
Re: Non-deter. regular expr. -> deter. regular expr. leichter@smarts.com (Jerry Leichter) (1996-03-22)
Re: Non-deter. regular expr. -> deter. regular exp mark@omnifest.uwm.edu (1996-03-27)
Re: Non-deter. regular expr. -> deter. regular exp mark@omnifest.uwm.edu (1996-03-27)
| List of all articles for this month |

 From: mark@omnifest.uwm.edu (Mark Hopkins) Newsgroups: comp.compilers Date: 27 Mar 1996 00:07:21 -0500 Organization: Omnifest References: 96-03-125 96-03-152 Keywords: DFA, theory

As far as I know, there is no such thing as a "non-deterministic"
regular expression, since every r.e. can be recognized by a
deterministic finite state automaton. So I have to interpret your
question to mean: Is there a purely algebraic way to extract a DFA
from a regular expression? There is.

A regular expression can be represented as a finite sum of terms,
each of which is a finite product of factors. A factor is either a
terminal or a starred regular expression. The type of algebra that
applies to regular expressions is called a semi-ring. Its axioms are:

a(bc) = (ab)c a+(b+c) = (a+b)+c
a1 = a = 1a a+0 = a = 0+a
a0d = 0 a+b = b+a
a(b+c)d = abd+acd

In addition, you have the property: a + a = a, which makes the algebra
an idempotent semi-ring. An expression in an idempotent semi-ring can
be represented as a finite set of terms, with each term a finite
sequence of factors and "factor" defined as above. Here, I'm using
the notation:

a + b <--> a union b
1 <--> empty string
0 <--> empty set

Therefore, by using the axioms in the following forms:

Set Manipulations:
a --> 0 + a
(a + b) + c --> (a + c) + b to sort
a + a --> a to remove duplicates.
a + 0 --> a to remove trailing zeros.
a + (b + c) --> (a + b) + c to associate to the left.
Sequence Manipulations:
a --> 1a
a1 --> a to remove trailing ones.
(ab)c --> a(bc) to associate to the right.
Distributivity:
a0 --> 0
a(b + c) --> ab + ac

you arrive at a normal form for expressions that can be mapped one-to-one to
the "finite set of finite sequence" form described above.

Example: a* ((b + c) d) + e (1 (f + g))
--> ((((0 + ((1 a*) b) d) + ((1 a*) c) d) + (1 e) f) + (1 e) g
= a* b d + a* c d + e f + e g
<-->
{ (a*, b, d), (a*, c, d), (e, f), (e, g) }

by repeated applications of the steps above, where I'm assuming that the

Afterwards you can strip off any leading 0's and 1's with the reductions:

0 + a --> a
1a --> a

You can convert each expression into a system of equations which it will be
the minimal solution to. Each equation in the system will have the form:

Taylor's Theorem:
E = E_0 + x dE/dx + y dE/dy + ... + z dE/dz

where x, y, ..., z are the terminals, and E_0 is either 0 or 1. This is
called Right-Linear form, since all the terminals (or "variables") are on
the right-hand side. The right-hand side will be "deterministic" in the
sense that no more than one term will contain a given terminal after it is
reduced to normal form.

E_0 is obtained by substituting 0 in for all the terminals in E.

d/dx is a "partial differential operator" and is defined by the following:

d1/dx = d0/dx = 0
dx/dx = 1, dy/dx = 0 if y is a terminal other than x.
d(A*)/dx = dA/dx A*
d(AB)/dx = dA/dx B + A_0 dB/dx
d(A+B)/dx = dA/dx + dB/dx

Note the variation in the product rule. You can easily extend this to the
other operations [A] = 1 + A and A+ = A A*, by the following:

d[A]/dx = d(1 + A)/dx = d1/dx + dA/dx = 0 + dA/dx = dA/dx
d(A+)/dx = dA/dx A* + A_0 dA/dx A*
= (1 + A_0) dA/dx A*
= dA/dx A*

since 1 + A_0 is either 1 + 0 = 1, or 1 + 1 = 1.

Define the total differential dE as the sum of all the terms x dE/dx. Then
you can write:
E = E_0 + dE
with
d0 = d1 = 0
dx = x, for all terminals x
d(A*) = d(A+) = dA A*
d([A]) = dA
d(AB) = dA B + A_0 dB
d(A+B) = dA + dB

Also, you can define E_0 by the following table:

0_0 = x_0 = 0
1_0 = [A]_0 = (A*)_0 = 1
(A+)_0 = A_0
(A+B)_0 = A_0 + B_0
(AB)_0 = A_0 B_0

which is equivalent to making the above-mentioned substitutions.

So apply Taylor's Theorem and reduce the result to normal form.

Example: a* (b a*)*
(a* (b a*)*)_0 = 0* (0 0*)* = 1 (0)* = 1

d(a* (b a*)*) = a (a* (b a*)*) + (b a*) (b a*)*
--> a (a* (b a*)*) + b (a* (b a*)*)

Which establishes a* (b a*)* as a solution to a system consisting of only
one equation, of the form:

x = 1 + a x + b x

In general, there will be more than one expression and more than one equation.
When arriving at the result:

E = E_0 + x dE/dx + y dE/dy + ... + z dE/dz

each of the derivatives dE/dx, dE/dy and dE/dz will have to be likewise
reduced if they already haven't. This is where the extra equations will
come from.

The automaton is read off this as follows:

(1) Each expression is a state
(2) State E is final if E_0 = 1.
(3) State E has a transition to state F on x if dE/dx = F.

---------------

To actually prove that each expression is a solution to its corresponding
system of equations (and that it's also the least solution), you need to
extend the algebra by adding in the following axioms:

1 + a a* = a*
if a + bx + xc <= x then b* a c* <= x

where (u <= v) is defined by (u + v = v).

From this you can ultimately show that

0* = 1* = 1
a* = 1 + a a* = 1 + a* a

and other similar properties, as well as showing that the following equations
have the following least solutions:

x = a + bx x = b* a
x = a + xc x = a c*
x = a + bx + xc x = b* a c*
x = a + xdx x = a (d a)* = (a d)* a
x = a + bx + xc + xdx x = b* a c* (d b* a c*)*

For the example above, this implies that:

x = 1 + ax + bx = 1 + (a + b)x

has (a + b)* as a least solution, whose equality to a* (b a*)* is also
proveable from the two axioms above.

--

Post a followup to this message