10 Aug 1998 10:18:24 -0400

Related articles |
---|

[?] equivalence of regular expressions qjackson@wave.home.com (Quinn Tyler Jackson) (1998-08-05) |

Re: [?] equivalence of regular expressions bromage@cs.mu.OZ.AU (1998-08-10) |

From: | bromage@cs.mu.OZ.AU (Andrew Bromage) |

Newsgroups: | comp.compilers |

Date: | 10 Aug 1998 10:18:24 -0400 |

Organization: | Computer Science, The University of Melbourne |

References: | 98-08-036 |

Keywords: | lex, theory |

G'day all.

"Quinn Tyler Jackson" <qjackson@wave.home.com> writes:

*>What I want to be able to do during the compilation phase is determine*

*>whether a given child rule satisfies a parent rule, to prevent rules that*

*>could never match anything [...]*

I think that what you actually want here is subsumption of regular

expressions, not equivalence. That is, you want to see if, given a

parent P and a child C, the regular set C is a subset of P. Or, using

regex notation, you want to see if C-P is the null set (where '-' is

set difference).

This is easy if you have a DFA/NFA construction algorithm which allows

'-' as an operator. If not, you can do it by constructing a DFA for P

and one for C, and comparing states, using a procedure similar to

unification over regular trees.

Naturally, there are more efficient algorithms, but all methods I could

think of boil down to optimisations of the above scheme, avoiding the

DFA construction. Most of them involve producing the DFA states lazily

as you do the check.

I've attached some sample code which runs in Mercury. (See

http://www.cs.mu.oz.au/mercury; the notation is almost identical

to Prolog modulo function syntax and declarations.) It's not

exhaustively tested, so use at your own risk. In particular, to be

"production quality", we should do local simplifications of the regular

expressions to avoid reductions and make the matching more efficient.)

I should also point out that this is quite a general technique to prove

all sorts of properties of given regular expressions, such as identity

and intersection. The code below could easily be adapted to do this

by replacing "difference" with whatever property you want to test for.

Cheers,

Andrew Bromage

--------8<---CUT HERE---8<---CAT HAIR---8<--------

*:- module regex_subsumes.*

*:- interface.*

*:- import_module char.*

*:- type regex*

---> or(regex, regex)

; star(regex)

; term(char)

; empty_string

; null_set

; and(regex, regex).

*:- pred subsumes(regex, regex).*

*:- mode subsumes(in, in) is semidet.*

*:- implementation.*

*:- import_module std_util, set, list, bool.*

*:- type successor == pair(char, regex).*

*:- type dfa_state*

---> dfa_state(

bool, % Final state?

list(successor) % Transitions

% (Sorted, no duplicate keys)

).

% union(Q0, Q1) returns the "union" of two DFA states.

%

*:- func union(dfa_state, dfa_state) = dfa_state.*

union(dfa_state(FinalA, TransA), dfa_state(FinalB, TransB))

= dfa_state(Final, union_2(TransA, TransB)) :-

bool:or(FinalA, FinalB, Final).

*:- func union_2(list(successor), list(successor)) = list(successor).*

union_2([], Ys) = Ys.

union_2([X | Xs], []) = [X | Xs].

union_2([X | Xs], [Y | Ys]) = Zs :-

X = TX - SX,

Y = TY - SY,

compare(Cmp, TX, TY),

( Cmp = (<),

Zs = [X | union_2(Xs, [Y | Ys])]

; Cmp = (>),

Zs = [Y | union_2([X | Xs], Ys)]

; Cmp = (=),

Zs = [TX - or(SX, SY) | union_2(Xs, Ys)]

).

% Apply simple reduction rules to turn a regex into a DFA

% state which accepts that regex. Kudos to Mark Hopkins who

% gave me the idea for these.

%

*:- func reduce(regex) = dfa_state.*

reduce(or(R1, R2)) = union(reduce(R1), reduce(R2)).

reduce(star(R)) = reduce(or(empty_string, and(R, star(R)))).

reduce(term(T)) = dfa_state(no, [T - empty_string]).

reduce(empty_string) = dfa_state(yes, []).

reduce(null_set) = dfa_state(no, []).

reduce(and(or(R1,R2),R3)) = reduce(or(and(R1, R3), and(R2, R3))).

reduce(and(star(R1),R2)) = reduce(or(R2, and(R1, and(star(R1), R2)))).

reduce(and(term(T),R)) = dfa_state(no, [T - R]).

reduce(and(empty_string, R)) = reduce(R).

reduce(and(null_set, _)) = dfa_state(no, []).

reduce(and(and(R1, R2), R3)) = reduce(and(R1, and(R2, R3))).

% difference(Q0, Q1) returns a set of "difference" constraints

% which, if satisfied, would prove that Q1 subsumes Q0. If we

% can prove that Q1 does not subsume Q0, fail. Luckily for us,

% all the constraints are of the form "E1-E2 is null", so this

% test can be applied recursively (but isn't, because we want

% to detect infinite recursion).

%

*:- func difference(dfa_state :: in, dfa_state :: in)*

= (list(pair(regex)) :: out) is semidet.

difference(dfa_state(FinalA, TransA), dfa_state(FinalB, TransB))

= difference_2(TransA, TransB) :-

( FinalA = yes => FinalB = yes ).

*:- func difference_2(list(successor) :: in, list(successor) :: in)*

= (list(pair(regex)) :: out) is det.

difference_2([], _) = [].

difference_2([_ - SX | Xs], [])

= [SX - null_set | difference_2(Xs, [])].

difference_2([X | Xs], [Y | Ys]) = Zs :-

X = TX - SX,

Y = TY - SY,

compare(Cmp, TX, TY),

( Cmp = (<),

Zs = [SX - null_set | difference_2(Xs, [Y | Ys])]

; Cmp = (>),

Zs = difference_2([X | Xs], Ys)

; Cmp = (=),

Zs = [SX - SY | difference_2(Xs, Ys)]

).

% difference(Q0, Q1) returns a set of "difference" constraints

% which, if satisfied, would prove that Q1 subsumes Q0. If we

% can prove that Q1 does not subsume Q0, fail.

%

*:- func difference(dfa_state :: in, dfa_state :: in)*

= (list(pair(regex)) :: out) is semidet.

difference(dfa_state(FinalA, TransA), dfa_state(FinalB, TransB))

= difference_2(TransA, TransB) :-

( FinalA = yes => FinalB = yes ).

*:- func difference_2(list(successor) :: in, list(successor) :: in)*

= (list(pair(regex)) :: out) is det.

difference_2([], _) = [].

difference_2([_ - SX | Xs], [])

= [SX - null_set | difference_2(Xs, [])].

difference_2([X | Xs], [Y | Ys]) = Zs :-

X = TX - SX,

Y = TY - SY,

compare(Cmp, TX, TY),

( Cmp = (<),

Zs = [SX - null_set | difference_2(Xs, [Y | Ys])]

; Cmp = (>),

Zs = difference_2([X | Xs], Ys)

; Cmp = (=),

Zs = [SX - SY | difference_2(Xs, Ys)]

).

% Given all this, the subsumption test is extremely simple.

% S1 subsumes S2 if S2-S1 is null. Make the two expressions

% into DFA states, work out what constraints should be satisfied

% to prove that the difference between the states are null, and

% recursively try them. Avoid infinite recursion by keeping

% track of what we've already looked at.

%

subsumes(S1, S2) :-

set:init(Seen),

subsumes_2([S2 - S1], Seen).

*:- pred subsumes_2(list(pair(regex)), set(pair(regex))).*

*:- mode subsumes_2(in, in) is semidet.*

subsumes_2([], _).

subsumes_2([S1 - S2 | States0], Seen0) :-

( set:member(S1 - S2, Seen0) ->

subsumes_2(States0, Seen0)

;

set:insert(Seen0, S1 - S2, Seen),

list:append(difference(reduce(S1), reduce(S2)),

States0, States),

subsumes_2(States, Seen)

).

--

Post a followup to this message

Return to the
comp.compilers page.

Search the
comp.compilers archives again.