Re: Interesting paper on regex NFA matching

Kaz Kylheku <433-929-6894@kylheku.com>
Sat, 27 Jan 2024 20:20:29 -0000

          From comp.compilers

Related articles
Interesting paper on regex NFA matching johnl@taugh.com (John R Levine) (2024-01-25)
Re: Interesting paper on regex NFA matching cross@spitfire.i.gajendra.net (2024-01-26)
Re: Interesting paper on regex NFA matching 433-929-6894@kylheku.com (Kaz Kylheku) (2024-01-26)
Re: Interesting paper on regex NFA matching christopher.f.clark@compiler-resources.com (Christopher F Clark) (2024-01-27)
Re: Interesting paper on regex NFA matching 433-929-6894@kylheku.com (Kaz Kylheku) (2024-01-27)
Re: Interesting paper on regex NFA matching christopher.f.clark@compiler-resources.com (Christopher F Clark) (2024-01-29)
Re: Interesting paper on regex NFA matching 433-929-6894@kylheku.com (Kaz Kylheku) (2024-01-29)
Re: Interesting paper on regex NFA matching christopher.f.clark@compiler-resources.com (Christopher F Clark) (2024-02-01)
Re: Interesting paper on regex NFA matching drikosev@gmail.com (Ev Drikos) (2024-02-04)
| List of all articles for this month |
From: Kaz Kylheku <433-929-6894@kylheku.com>
Newsgroups: comp.compilers
Date: Sat, 27 Jan 2024 20:20:29 -0000
Organization: Compilers Central
References: 24-01-006
Injection-Info: gal.iecc.com; posting-host="news.iecc.com:2001:470:1f07:1126:0:676f:7373:6970"; logging-data="81627"; mail-complaints-to="abuse@iecc.com"
Keywords: lex, comment
Posted-Date: 27 Jan 2024 22:19:04 EST

On 2024-01-27, Christopher F Clark <christopher.f.clark@compiler-resources.com> wrote:
> Kaz Kylheku <433-929-6894@kylheku.com> wrote:
>
>> (I personaly am not so interested in backtracking regexes; and they are
>> not directly applicable to compilers. You'd never want to design a
>> lexical analyzer that requires "lookaround" or whatnot in order to
>> recognize tokens. It's nice they are generating Arxiv papers for
>> someone; no genuine intellectual inquiry is invalid; good luck! It's
>> not applicable to the "Programming Languages" category it has been filed
>> under. I will skim through it, though.)
>
> You are mistaken that they are not applicable to programming languages.
>
> Even Pascal [originally] had cases where lookahead and backtracking
> are required to properly tokenize the input.
>
> The well-known case is "1." if followed by another "." it is two tokens,
> and if followed by a digit it is a floating point number.
> I don't recall what it should be if followed by any other character.


I think that case can be handled by technology like Lex trailing
contexts, which doesn't require backtracking.


    {digits}/. {
        // ... set up semantic value ..
        return INTEGER;
    }


    {digits}.{digits} {
        // ... set up semantic value ..
        return FLOAT;
    }


Another possibility is to recongize {digits}. as one lexeme and call
unput('.'). That's a minor form of backtracking, that doesn't require us
to do anything funny to the regex implementation.


--
TXR Programming Language: http://nongnu.org/txr
Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal
Mastodon: @Kazinator@mstdn.ca


[Patterns with alternatives and trailing contexts do have to back up
and the flex manual warns it's quite expensive. See the
PERFORMANCE CONSIDERATIONS and DEFICIENCIES parts of the man page. -John]


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.