re2c-1.0 released!

Ulya Trofimovich <skvadrik@gmail.com>
Sun, 27 Aug 2017 18:25:41 +0100

          From comp.compilers

Related articles
re2c-1.0 released! skvadrik@gmail.com (Ulya Trofimovich) (2017-08-27)
Re: re2c-1.0 released! 398-816-0742@kylheku.com (Kaz Kylheku) (2017-09-02)
Re: re2c-1.0 released! 398-816-0742@kylheku.com (Kaz Kylheku) (2017-09-02)
Re: re2c-1.0 released! anton@mips.complang.tuwien.ac.at (2017-09-02)
Re: re2c-1.0 released! gneuner2@comcast.net (George Neuner) (2017-09-02)
Re: re2c-1.0 released! skvadrik@gmail.com (Ulya Trofimovich) (2017-09-03)
Re: re2c-1.0 released! jamin.hanson@googlemail.com (Ben Hanson) (2017-09-03)
[6 later articles]
| List of all articles for this month |

Reroute: compilers@archive.iecc.com
From: Ulya Trofimovich <skvadrik@gmail.com>
Newsgroups: comp.compilers
Date: Sun, 27 Aug 2017 18:25:41 +0100
Organization: Compilers Central
Injection-Info: miucha.iecc.com; posting-host="www.ucenet.org:2001:470:1f07:1126:0:7563:656e:6574"; logging-data="80784"; mail-complaints-to="abuse@iecc.com"
Keywords: tools, available, lex
Posted-Date: 27 Aug 2017 13:45:46 EDT
Content-Language: en-US

Hello everybody,




I'm glad to announce the new major release of re2c lexer generator, 1.0.
re2c specializes in generating fast and flexible lexers; it was written
by Peter Bumbulis around 1994 [0] and since then maintained by many
people. See release notes for the full story [1].


The main breakthrough of this release is the addition of fast submatch
extraction: re2c now supports capturing groups with POSIX longest-match
semantics, as well as standalone capturing "tags" with leftmost greedy
semantics. The implementation is based on the efficient novel algorithm [2].


The challenge of adding submatch extraction to a lexer generator is not
immediately obvious: this feature is provided by many regular expression
libraries and command-line tools like grep and sed. It may seem that
only a lack of effort prevents developers of lexer generators like Flex
from implementing it (as well as fixing the ever-broken trailing
contexts [3]).


The truth is, libraries and tools process regular expressions at run
time and use NFA-based algorithms when they need submatch extraction.
Lexer generators, on the other hand, compile regular expressions to DFA.
As it turns out, submatch extraction is inherently more complex than
recognition: it can be solved on NFA, but not on (ordinary) DFA.


The new algorithm [1] is based on the Tagged DFA invented by Ville
Laurikari in 2000 [4]; it also incorporates POSIX disambiguation
algorithm informally described by Kuklewicz in 2007 [5]. Any feedback on
the new algorithm would be much appreciated.




Ulya Trofimovich




[0] https://compilers.iecc.com/comparch/article/94-04-115


[1] http://re2c.org/news/release_notes/1_0.html


[2]
http://re2c.org/2017_trofimovich_tagged_deterministic_finite_automata_with_lookahead.pdf


[3] https://ftp.gnu.org/old-gnu/Manuals/flex-2.5.4/html_node/flex_23.html


[4] https://laurikari.net/ville/spire2000-tnfa.pdf


[5] https://wiki.haskell.org/index.php?title=Regular_expressions/Bounded_space_proposal&oldid=11475


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.