Re: re2c-1.0 released!

Ulya Trofimovich <skvadrik@gmail.com>
Sun, 3 Sep 2017 18:31:39 +0100

          From comp.compilers

Related articles
[4 earlier articles]
Re: re2c-1.0 released! gneuner2@comcast.net (George Neuner) (2017-09-02)
Re: re2c-1.0 released! skvadrik@gmail.com (Ulya Trofimovich) (2017-09-03)
Re: re2c-1.0 released! jamin.hanson@googlemail.com (Ben Hanson) (2017-09-03)
Re: re2c-1.0 released! jamin.hanson@googlemail.com (Ben Hanson) (2017-09-03)
Re: re2c-1.0 released! 398-816-0742@kylheku.com (Kaz Kylheku) (2017-09-03)
Re: re2c-1.0 released! skvadrik@gmail.com (Ulya Trofimovich) (2017-09-03)
Re: re2c-1.0 released! skvadrik@gmail.com (Ulya Trofimovich) (2017-09-03)
Re: re2c-1.0 released! jamin.hanson@googlemail.com (Ben Hanson) (2017-09-04)
Re: re2c-1.0 released! skvadrik@gmail.com (Ulya Trofimovich) (2017-09-08)
| List of all articles for this month |

From: Ulya Trofimovich <skvadrik@gmail.com>
Newsgroups: comp.compilers
Date: Sun, 3 Sep 2017 18:31:39 +0100
Organization: Compilers Central
References: 17-08-007 17-09-001 17-09-003
Keywords: lex
Content-Language: en-US

> Thanks for this. I will take a look at the implementation when I get
> the time.
>
> I'm most interested in your refinements/bug fixes to the original
> TDFA algorithms and I would like to see this used as a runtime regex
> library more than for a lexer generator.


Using TDFA in a runtime library has certain downsides:


1. It implies the runtime overhead on (lazy) determinization: it's only
useful if the regexp is compiled once and used many times. Libraries
such as RE2 (by Russ Cox) or TRE (by Ville Laurikari) mostly use NFA for
submatching.


2. Also, due to ahead-of-time compilation lexer generators are able to
perform extensive optimization of the generated code, including
minimization of the number of submatch variables and operations on them.
A runtime library cannot afford that.


However, if TDFA algorithm is to be used in a library, then my paper
contains two improvements worth consideration:


1. TDFA(1) with one-symbol lookahead instead of default TDFA(0)


2. Formal algorithm for POSIX disambiguation semantics.


I'd be glad to answer any questions anyway.




Ulya



Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.