Re: Regular expression string searching & matching

Clint O <clint.olsen@gmail.com>
Mon, 12 Mar 2018 14:00:48 -0700 (PDT)

          From comp.compilers

Related articles
Regular expression string searching & matching clint.olsen@gmail.com (Clint O) (2018-03-04)
Re: Regular expression string searching & matching jamin.hanson@googlemail.com (Ben Hanson) (2018-03-07)
Re: Regular expression string searching & matching jamin.hanson@googlemail.com (Ben Hanson) (2018-03-07)
Re: Regular expression string searching & matching clint.olsen@gmail.com (Clint O) (2018-03-08)
Re: Regular expression string searching & matching clint.olsen@gmail.com (Clint O) (2018-03-10)
Re: Regular expression string searching & matching jamin.hanson@googlemail.com (Ben Hanson) (2018-03-10)
Re: Regular expression string searching & matching jamin.hanson@googlemail.com (Ben Hanson) (2018-03-11)
Re: Regular expression string searching & matching clint.olsen@gmail.com (Clint O) (2018-03-12)
Re: Regular expression string searching & matching jamin.hanson@googlemail.com (Ben Hanson) (2018-03-12)
Re: Regular expression string searching & matching DrDiettrich1@netscape.net (Hans-Peter Diettrich) (2018-03-13)
Re: Regular expression string searching & matching jamin.hanson@googlemail.com (Ben Hanson) (2018-03-13)
Re: Regular expression string searching & matching jamin.hanson@googlemail.com (Ben Hanson) (2018-03-13)
Re: Regular expression string searching & matching clint.olsen@gmail.com (Clint O) (2018-03-17)
Re: Regular expression string searching & matching clint.olsen@gmail.com (Clint O) (2018-03-18)
[2 later articles]
| List of all articles for this month |
From: Clint O <clint.olsen@gmail.com>
Newsgroups: comp.compilers
Date: Mon, 12 Mar 2018 14:00:48 -0700 (PDT)
Organization: Compilers Central
References: 18-03-016 18-03-032 18-03-034 18-03-035 18-03-041
Injection-Info: gal.iecc.com; posting-host="news.iecc.com:2001:470:1f07:1126:0:676f:7373:6970"; logging-data="32721"; mail-complaints-to="abuse@iecc.com"
Keywords: lex
Posted-Date: 12 Mar 2018 21:36:19 EDT

On Monday, March 12, 2018 at 1:19:29 PM UTC-7, Ben Hanson wrote:
> > /This/ actually worked for me (one character change):
> >
> > [/][*]([^*]|[*]+[^/])*[*]+[/]
>
> Your modified regex produces the following state machine:
>
[snip]
>
> Which will match
>
> /***/a*/
>
> in its entirety, when if should only match
>
> /***/
>
> Regards,
>
> Ben
> [Doesn't that depend on whether you interpret the END STATE in state 6 to
stop even
> if there's more input? -John]


Interesting. I'm not seeing this behavior with the sample input you've
provided. Again, I'm willing to concede that I have a bug :) What I'm doing is
simulating the DFA until I get to an error state or I hit EOF. So, this
guarantees I'll record the longest match I've found.


I could post the states that I come up with, but my state dumper also prints
out the RE it's currently processing (the actual expression). The successive
computation of derivatives can sometimes produce some rather abhorrent output,
and it's not always obvious (to me) what's going on. I'll work on a cleaner
presentation and try to post this.


It also looks like you are running a DFA minimizer (like Hopcroft) on your
result since I am not producing a minimal DFA. That also may help me figure
out if I'm producing the right automaton because they'd match...


Thanks,


-Clint


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.