re2c-1.0 released!

Ulya Trofimovich <>
Sun, 27 Aug 2017 18:25:41 +0100

          From comp.compilers

Related articles
re2c-1.0 released! (Ulya Trofimovich) (2017-08-27)
Re: re2c-1.0 released! (Kaz Kylheku) (2017-09-02)
Re: re2c-1.0 released! (Kaz Kylheku) (2017-09-02)
Re: re2c-1.0 released! (2017-09-02)
Re: re2c-1.0 released! (George Neuner) (2017-09-02)
Re: re2c-1.0 released! (Ulya Trofimovich) (2017-09-03)
Re: re2c-1.0 released! (Ben Hanson) (2017-09-03)
[6 later articles]
| List of all articles for this month |

From: Ulya Trofimovich <>
Newsgroups: comp.compilers
Date: Sun, 27 Aug 2017 18:25:41 +0100
Organization: Compilers Central
Injection-Info:; posting-host=""; logging-data="80784"; mail-complaints-to=""
Keywords: tools, available, lex
Posted-Date: 27 Aug 2017 13:45:46 EDT
Content-Language: en-US

Hello everybody,

I'm glad to announce the new major release of re2c lexer generator, 1.0.
re2c specializes in generating fast and flexible lexers; it was written
by Peter Bumbulis around 1994 [0] and since then maintained by many
people. See release notes for the full story [1].

The main breakthrough of this release is the addition of fast submatch
extraction: re2c now supports capturing groups with POSIX longest-match
semantics, as well as standalone capturing "tags" with leftmost greedy
semantics. The implementation is based on the efficient novel algorithm [2].

The challenge of adding submatch extraction to a lexer generator is not
immediately obvious: this feature is provided by many regular expression
libraries and command-line tools like grep and sed. It may seem that
only a lack of effort prevents developers of lexer generators like Flex
from implementing it (as well as fixing the ever-broken trailing
contexts [3]).

The truth is, libraries and tools process regular expressions at run
time and use NFA-based algorithms when they need submatch extraction.
Lexer generators, on the other hand, compile regular expressions to DFA.
As it turns out, submatch extraction is inherently more complex than
recognition: it can be solved on NFA, but not on (ordinary) DFA.

The new algorithm [1] is based on the Tagged DFA invented by Ville
Laurikari in 2000 [4]; it also incorporates POSIX disambiguation
algorithm informally described by Kuklewicz in 2007 [5]. Any feedback on
the new algorithm would be much appreciated.

Ulya Trofimovich







Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.