Re: [?] equivalence of regular expressions

bromage@cs.mu.OZ.AU (Andrew Bromage)
10 Aug 1998 10:18:24 -0400

From comp.compilers

Related articles
[?] equivalence of regular expressions qjackson@wave.home.com (Quinn Tyler Jackson) (1998-08-05)
*Re: [?] equivalence of regular expressions bromage@cs.mu.OZ.AU* (1998-08-10)**

| List of all articles for this month |

From:	bromage@cs.mu.OZ.AU (Andrew Bromage)
Newsgroups:	comp.compilers
Date:	10 Aug 1998 10:18:24 -0400
Organization:	Computer Science, The University of Melbourne
References:	98-08-036
Keywords:	lex, theory

G'day all.

"Quinn Tyler Jackson" <qjackson@wave.home.com> writes:

>What I want to be able to do during the compilation phase is determine
>whether a given child rule satisfies a parent rule, to prevent rules that
>could never match anything [...]

I think that what you actually want here is subsumption of regular
expressions, not equivalence. That is, you want to see if, given a
parent P and a child C, the regular set C is a subset of P. Or, using
regex notation, you want to see if C-P is the null set (where '-' is
set difference).

This is easy if you have a DFA/NFA construction algorithm which allows
'-' as an operator. If not, you can do it by constructing a DFA for P
and one for C, and comparing states, using a procedure similar to
unification over regular trees.

Naturally, there are more efficient algorithms, but all methods I could
think of boil down to optimisations of the above scheme, avoiding the
DFA construction. Most of them involve producing the DFA states lazily
as you do the check.

I've attached some sample code which runs in Mercury. (See
http://www.cs.mu.oz.au/mercury; the notation is almost identical
to Prolog modulo function syntax and declarations.) It's not
exhaustively tested, so use at your own risk. In particular, to be
"production quality", we should do local simplifications of the regular
expressions to avoid reductions and make the matching more efficient.)

I should also point out that this is quite a general technique to prove
all sorts of properties of given regular expressions, such as identity
and intersection. The code below could easily be adapted to do this
by replacing "difference" with whatever property you want to test for.

Cheers,
Andrew Bromage

--------8<---CUT HERE---8<---CAT HAIR---8<--------

:- module regex_subsumes.
:- interface.
:- import_module char.

:- type regex
---> or(regex, regex)
; star(regex)
; term(char)
; empty_string
; null_set
; and(regex, regex).

:- pred subsumes(regex, regex).
:- mode subsumes(in, in) is semidet.

:- implementation.
:- import_module std_util, set, list, bool.

:- type successor == pair(char, regex).

:- type dfa_state
---> dfa_state(
bool, % Final state?
list(successor) % Transitions
% (Sorted, no duplicate keys)
).

% union(Q0, Q1) returns the "union" of two DFA states.
%
:- func union(dfa_state, dfa_state) = dfa_state.

union(dfa_state(FinalA, TransA), dfa_state(FinalB, TransB))
= dfa_state(Final, union_2(TransA, TransB)) :-
bool:or(FinalA, FinalB, Final).

:- func union_2(list(successor), list(successor)) = list(successor).

union_2([], Ys) = Ys.
union_2([X | Xs], []) = [X | Xs].
union_2([X | Xs], [Y | Ys]) = Zs :-
X = TX - SX,
Y = TY - SY,
compare(Cmp, TX, TY),
( Cmp = (<),
Zs = [X | union_2(Xs, [Y | Ys])]
; Cmp = (>),
Zs = [Y | union_2([X | Xs], Ys)]
; Cmp = (=),
Zs = [TX - or(SX, SY) | union_2(Xs, Ys)]
).

% Apply simple reduction rules to turn a regex into a DFA
% state which accepts that regex. Kudos to Mark Hopkins who
% gave me the idea for these.
%
:- func reduce(regex) = dfa_state.

reduce(or(R1, R2)) = union(reduce(R1), reduce(R2)).
reduce(star(R)) = reduce(or(empty_string, and(R, star(R)))).
reduce(term(T)) = dfa_state(no, [T - empty_string]).
reduce(empty_string) = dfa_state(yes, []).
reduce(null_set) = dfa_state(no, []).
reduce(and(or(R1,R2),R3)) = reduce(or(and(R1, R3), and(R2, R3))).
reduce(and(star(R1),R2)) = reduce(or(R2, and(R1, and(star(R1), R2)))).
reduce(and(term(T),R)) = dfa_state(no, [T - R]).
reduce(and(empty_string, R)) = reduce(R).
reduce(and(null_set, _)) = dfa_state(no, []).
reduce(and(and(R1, R2), R3)) = reduce(and(R1, and(R2, R3))).

% difference(Q0, Q1) returns a set of "difference" constraints
% which, if satisfied, would prove that Q1 subsumes Q0. If we
% can prove that Q1 does not subsume Q0, fail. Luckily for us,
% all the constraints are of the form "E1-E2 is null", so this
% test can be applied recursively (but isn't, because we want
% to detect infinite recursion).
%
:- func difference(dfa_state :: in, dfa_state :: in)
= (list(pair(regex)) :: out) is semidet.

difference(dfa_state(FinalA, TransA), dfa_state(FinalB, TransB))
= difference_2(TransA, TransB) :-
( FinalA = yes => FinalB = yes ).

:- func difference_2(list(successor) :: in, list(successor) :: in)
= (list(pair(regex)) :: out) is det.

difference_2([], _) = [].
difference_2([_ - SX | Xs], [])
= [SX - null_set | difference_2(Xs, [])].
difference_2([X | Xs], [Y | Ys]) = Zs :-
X = TX - SX,
Y = TY - SY,
compare(Cmp, TX, TY),
( Cmp = (<),
Zs = [SX - null_set | difference_2(Xs, [Y | Ys])]
; Cmp = (>),
Zs = difference_2([X | Xs], Ys)
; Cmp = (=),
Zs = [SX - SY | difference_2(Xs, Ys)]
).

% difference(Q0, Q1) returns a set of "difference" constraints
% which, if satisfied, would prove that Q1 subsumes Q0. If we
% can prove that Q1 does not subsume Q0, fail.
%
:- func difference(dfa_state :: in, dfa_state :: in)
= (list(pair(regex)) :: out) is semidet.

difference(dfa_state(FinalA, TransA), dfa_state(FinalB, TransB))
= difference_2(TransA, TransB) :-
( FinalA = yes => FinalB = yes ).

:- func difference_2(list(successor) :: in, list(successor) :: in)
= (list(pair(regex)) :: out) is det.

difference_2([], _) = [].
difference_2([_ - SX | Xs], [])
= [SX - null_set | difference_2(Xs, [])].
difference_2([X | Xs], [Y | Ys]) = Zs :-
X = TX - SX,
Y = TY - SY,
compare(Cmp, TX, TY),
( Cmp = (<),
Zs = [SX - null_set | difference_2(Xs, [Y | Ys])]
; Cmp = (>),
Zs = difference_2([X | Xs], Ys)
; Cmp = (=),
Zs = [SX - SY | difference_2(Xs, Ys)]
).

% Given all this, the subsumption test is extremely simple.
% S1 subsumes S2 if S2-S1 is null. Make the two expressions
% into DFA states, work out what constraints should be satisfied
% to prove that the difference between the states are null, and
% recursively try them. Avoid infinite recursion by keeping
% track of what we've already looked at.
%
subsumes(S1, S2) :-
set:init(Seen),
subsumes_2([S2 - S1], Seen).

:- pred subsumes_2(list(pair(regex)), set(pair(regex))).
:- mode subsumes_2(in, in) is semidet.

subsumes_2([], _).
subsumes_2([S1 - S2 | States0], Seen0) :-
( set:member(S1 - S2, Seen0) ->
subsumes_2(States0, Seen0)
;
set:insert(Seen0, S1 - S2, Seen),
list:append(difference(reduce(S1), reduce(S2)),
States0, States),
subsumes_2(States, Seen)
).
--

Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.

Re: [?] equivalence of regular expressions

bromage@cs.mu.OZ.AU (Andrew Bromage)10 Aug 1998 10:18:24 -0400

bromage@cs.mu.OZ.AU (Andrew Bromage)
10 Aug 1998 10:18:24 -0400