Related articles |
---|
[3 earlier articles] |
Re: How make multifinished DFA for merged regexps? borucki.andrzej@gmail.com (Andy) (2019-12-20) |
Re: How make multifinished DFA for merged regexps? 493-878-3164@kylheku.com (Kaz Kylheku) (2019-12-21) |
How make multifinished DFA for merged regexps? christopher.f.clark@compiler-resources.com (Christopher F Clark) (2019-12-23) |
Re: How make multifinished DFA for merged regexps? DrDiettrich1@netscape.net (Hans-Peter Diettrich) (2019-12-24) |
Re: How make multifinished DFA for merged regexps? matt.timmermans@gmail.com (Matt Timmermans) (2019-12-23) |
Re: How make multifinished DFA for merged regexps? christopher.f.clark@compiler-resources.com (Christopher F Clark) (2019-12-24) |
Re: How make multifinished DFA for merged regexps? rockbrentwood@gmail.com (2019-12-29) |
From: | rockbrentwood@gmail.com |
Newsgroups: | comp.compilers |
Date: | Sun, 29 Dec 2019 20:56:21 -0800 (PST) |
Organization: | Compilers Central |
References: | 19-12-005 |
Injection-Info: | gal.iecc.com; posting-host="news.iecc.com:2001:470:1f07:1126:0:676f:7373:6970"; logging-data="7825"; mail-complaints-to="abuse@iecc.com" |
Keywords: | lex, DFA |
Posted-Date: | 30 Dec 2019 11:30:03 EST |
In-Reply-To: | 19-12-005 |
On Thursday, December 19, 2019 at 10:01:24 PM UTC-6, Andy wrote:
> I can create DFA direct from regexp.
> But for language lexer I must have DFA for couple regexp.
> One solution is crating DFA with multi finished states.
> For example
> r0 = ab
> r1 = ac
>
> | 0 | 1
> a | 1 |
> b | | 2(F)
> c | | 3(F)
>
> How to check if r0 and r1 are disjoint?
In a day or so, I'll put back up on GitHub our regex utilities that had once
been up on the comp.compilers archive. The "DFA" program can process boolean
operations - including intersection and relative compliment. (The "NFA"
program produces efficient linear-sized near-deterministic FA, "REX" is like
GREP except it can process boolean combinations too, and "FSC" is a "finite
state classifier"). I'll post a link when it's up on GitHub.
Denote intersection by &. The regular expressions ab, ac are the least
fixed-point solutions to the following systems
[0] = ab >= a[1]
[1] = b >= b[2]
[2] = 1
[3] = ac >= a[4]
[4] = c >= c[5]
[5] = 1
The system for the intersection is obtained by distributivity:
[0]&[3] >= a([1]&[4])
[1]&[4] >= (b[2] | c0) & (b0 | c[2]) = b([2]&0) | c(0&[2]) = b0 | c0 = 0
The least fixed-point solution is [1]&[4] = 0, [0]&[3] = a0 = 0.
So, there's 0 intersection.
A more interesting example with relative compliment "-":
(a|b)* - (a|b)*(aa|bb)(a|b)*
The first term (a|b)* has the following right-linear system
[0] = (a|b)* = 1 | (a|b)(a|b)* = 1 | a(a|b)* | b(a|b)* >= 1 | a[0] | b[0]
The second term (a|b)*(aa|bb)(a|b)* has the following:
[1] = (a|b)*(aa|bb)(a|b)* = (aa|bb)(a|b)* | a[1] | b[1]
= a(a(a|b)* | [1]) | b(b(a|b)* | [1])
>= a[2] | b[3]
[2] = a(a|b)* | [1] = a(a|b)* | 1 | a[1] | b[1] = 1 | a((a|b)* | [1]) | b[1]
>= 1 | a[4] | b[1]
[3] = b(a|b)* | [1] = b(a|b)* | 1 | a[1] | b[1] = 1 | a[1] | b((a|b)* | [1])
>= 1 | a[1] | b[4]
[4] = (a|b)* | [1] = 1 | (a|b)(a|b)* | 1 | a[1] | b[1]
= 1 | 1 | a((a|b)* | [1]) | b((a|b)* | [1])
>= 1 | a[4] | b[4]
The system for the relative compliment can be found by distributivity:
[0]-[1] >= (1|a[0]|b[0]) - (0|a[2]|b[3])
= 1-0 | a([0]-[2]) | b([0]-[3])
= 1 | a([0]-[2]) | b([0]-[3])
[0]-[2] >= (1-1) | a([0]-[4]) | b([0]-[1]) = a([0]-[4]) | b([0]-[1])
[0]-[3] >= (1-1) | a([0]-[1]) | b([0]-[4]) = a([0]-[1]) | b([0]-[4])
[0]-[4] >= (1-1) | a([0]-[4]) | b([0]-[4]) = a([0]-[4]) | b([0]-[4])
The least fixed-point solution x >= ax | bx is x = 0, so for the last item, it
is [0]-[4] = 0. For the remaining items, the system therefore reduces to
[0]-[1] >= 1 | a([0]-[2]) | b([0]-[3])
[0]-[2] >= a0 | b([0]-[1]) = b([0]-[1])
[0]-[3] >= a([0]-[1]) | b0 = a([0]-[1])
which upon substitution reduces to a single inequality
[0]-[1] >= 1 | ab([0]-[1]) | ba([0]-[1]) = 1 | (ab|ba)([0]-[1]).
The least fixed point solution to x >= 1 | cx is x = c*. Thus, for [0]-[1], it
is
[0]-[1] = (ab|ba)*
Therefore, the relative compliment evaluates to:
(a|b)* - (a|b)*(aa|bb)(a|b)* = (ab|ba)*.
Return to the
comp.compilers page.
Search the
comp.compilers archives again.