Re: How make multifinished DFA for merged regexps?

rockbrentwood@gmail.com
Sun, 29 Dec 2019 20:56:21 -0800 (PST)

          From comp.compilers

Related articles
[3 earlier articles]
Re: How make multifinished DFA for merged regexps? borucki.andrzej@gmail.com (Andy) (2019-12-20)
Re: How make multifinished DFA for merged regexps? 493-878-3164@kylheku.com (Kaz Kylheku) (2019-12-21)
How make multifinished DFA for merged regexps? christopher.f.clark@compiler-resources.com (Christopher F Clark) (2019-12-23)
Re: How make multifinished DFA for merged regexps? DrDiettrich1@netscape.net (Hans-Peter Diettrich) (2019-12-24)
Re: How make multifinished DFA for merged regexps? matt.timmermans@gmail.com (Matt Timmermans) (2019-12-23)
Re: How make multifinished DFA for merged regexps? christopher.f.clark@compiler-resources.com (Christopher F Clark) (2019-12-24)
Re: How make multifinished DFA for merged regexps? rockbrentwood@gmail.com (2019-12-29)
| List of all articles for this month |
From: rockbrentwood@gmail.com
Newsgroups: comp.compilers
Date: Sun, 29 Dec 2019 20:56:21 -0800 (PST)
Organization: Compilers Central
References: 19-12-005
Injection-Info: gal.iecc.com; posting-host="news.iecc.com:2001:470:1f07:1126:0:676f:7373:6970"; logging-data="7825"; mail-complaints-to="abuse@iecc.com"
Keywords: lex, DFA
Posted-Date: 30 Dec 2019 11:30:03 EST
In-Reply-To: 19-12-005

On Thursday, December 19, 2019 at 10:01:24 PM UTC-6, Andy wrote:
> I can create DFA direct from regexp.
> But for language lexer I must have DFA for couple regexp.
> One solution is crating DFA with multi finished states.
> For example
> r0 = ab
> r1 = ac
>
> | 0 | 1
> a | 1 |
> b | | 2(F)
> c | | 3(F)
>
> How to check if r0 and r1 are disjoint?


In a day or so, I'll put back up on GitHub our regex utilities that had once
been up on the comp.compilers archive. The "DFA" program can process boolean
operations - including intersection and relative compliment. (The "NFA"
program produces efficient linear-sized near-deterministic FA, "REX" is like
GREP except it can process boolean combinations too, and "FSC" is a "finite
state classifier"). I'll post a link when it's up on GitHub.


Denote intersection by &. The regular expressions ab, ac are the least
fixed-point solutions to the following systems
[0] = ab >= a[1]
[1] = b >= b[2]
[2] = 1


[3] = ac >= a[4]
[4] = c >= c[5]
[5] = 1


The system for the intersection is obtained by distributivity:
[0]&[3] >= a([1]&[4])
[1]&[4] >= (b[2] | c0) & (b0 | c[2]) = b([2]&0) | c(0&[2]) = b0 | c0 = 0
The least fixed-point solution is [1]&[4] = 0, [0]&[3] = a0 = 0.
So, there's 0 intersection.


A more interesting example with relative compliment "-":
(a|b)* - (a|b)*(aa|bb)(a|b)*


The first term (a|b)* has the following right-linear system
[0] = (a|b)* = 1 | (a|b)(a|b)* = 1 | a(a|b)* | b(a|b)* >= 1 | a[0] | b[0]


The second term (a|b)*(aa|bb)(a|b)* has the following:
[1] = (a|b)*(aa|bb)(a|b)* = (aa|bb)(a|b)* | a[1] | b[1]
      = a(a(a|b)* | [1]) | b(b(a|b)* | [1])
      >= a[2] | b[3]
[2] = a(a|b)* | [1] = a(a|b)* | 1 | a[1] | b[1] = 1 | a((a|b)* | [1]) | b[1]
      >= 1 | a[4] | b[1]
[3] = b(a|b)* | [1] = b(a|b)* | 1 | a[1] | b[1] = 1 | a[1] | b((a|b)* | [1])
      >= 1 | a[1] | b[4]
[4] = (a|b)* | [1] = 1 | (a|b)(a|b)* | 1 | a[1] | b[1]
      = 1 | 1 | a((a|b)* | [1]) | b((a|b)* | [1])
      >= 1 | a[4] | b[4]


The system for the relative compliment can be found by distributivity:
[0]-[1] >= (1|a[0]|b[0]) - (0|a[2]|b[3])
      = 1-0 | a([0]-[2]) | b([0]-[3])
      = 1 | a([0]-[2]) | b([0]-[3])
[0]-[2] >= (1-1) | a([0]-[4]) | b([0]-[1]) = a([0]-[4]) | b([0]-[1])
[0]-[3] >= (1-1) | a([0]-[1]) | b([0]-[4]) = a([0]-[1]) | b([0]-[4])
[0]-[4] >= (1-1) | a([0]-[4]) | b([0]-[4]) = a([0]-[4]) | b([0]-[4])


The least fixed-point solution x >= ax | bx is x = 0, so for the last item, it
is [0]-[4] = 0. For the remaining items, the system therefore reduces to
[0]-[1] >= 1 | a([0]-[2]) | b([0]-[3])
[0]-[2] >= a0 | b([0]-[1]) = b([0]-[1])
[0]-[3] >= a([0]-[1]) | b0 = a([0]-[1])
which upon substitution reduces to a single inequality
[0]-[1] >= 1 | ab([0]-[1]) | ba([0]-[1]) = 1 | (ab|ba)([0]-[1]).


The least fixed point solution to x >= 1 | cx is x = c*. Thus, for [0]-[1], it
is
[0]-[1] = (ab|ba)*
Therefore, the relative compliment evaluates to:
(a|b)* - (a|b)*(aa|bb)(a|b)* = (ab|ba)*.


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.