Related articles |
---|
re2c-1.0 released! skvadrik@gmail.com (Ulya Trofimovich) (2017-08-27) |
Re: re2c-1.0 released! 398-816-0742@kylheku.com (Kaz Kylheku) (2017-09-02) |
Re: re2c-1.0 released! 398-816-0742@kylheku.com (Kaz Kylheku) (2017-09-02) |
Re: re2c-1.0 released! anton@mips.complang.tuwien.ac.at (2017-09-02) |
Re: re2c-1.0 released! gneuner2@comcast.net (George Neuner) (2017-09-02) |
Re: re2c-1.0 released! skvadrik@gmail.com (Ulya Trofimovich) (2017-09-03) |
Re: re2c-1.0 released! jamin.hanson@googlemail.com (Ben Hanson) (2017-09-03) |
[6 later articles] |
Reroute: | compilers@archive.iecc.com |
From: | Ulya Trofimovich <skvadrik@gmail.com> |
Newsgroups: | comp.compilers |
Date: | Sun, 27 Aug 2017 18:25:41 +0100 |
Organization: | Compilers Central |
Injection-Info: | miucha.iecc.com; posting-host="www.ucenet.org:2001:470:1f07:1126:0:7563:656e:6574"; logging-data="80784"; mail-complaints-to="abuse@iecc.com" |
Keywords: | tools, available, lex |
Posted-Date: | 27 Aug 2017 13:45:46 EDT |
Content-Language: | en-US |
Hello everybody,
I'm glad to announce the new major release of re2c lexer generator, 1.0.
re2c specializes in generating fast and flexible lexers; it was written
by Peter Bumbulis around 1994 [0] and since then maintained by many
people. See release notes for the full story [1].
The main breakthrough of this release is the addition of fast submatch
extraction: re2c now supports capturing groups with POSIX longest-match
semantics, as well as standalone capturing "tags" with leftmost greedy
semantics. The implementation is based on the efficient novel algorithm [2].
The challenge of adding submatch extraction to a lexer generator is not
immediately obvious: this feature is provided by many regular expression
libraries and command-line tools like grep and sed. It may seem that
only a lack of effort prevents developers of lexer generators like Flex
from implementing it (as well as fixing the ever-broken trailing
contexts [3]).
The truth is, libraries and tools process regular expressions at run
time and use NFA-based algorithms when they need submatch extraction.
Lexer generators, on the other hand, compile regular expressions to DFA.
As it turns out, submatch extraction is inherently more complex than
recognition: it can be solved on NFA, but not on (ordinary) DFA.
The new algorithm [1] is based on the Tagged DFA invented by Ville
Laurikari in 2000 [4]; it also incorporates POSIX disambiguation
algorithm informally described by Kuklewicz in 2007 [5]. Any feedback on
the new algorithm would be much appreciated.
Ulya Trofimovich
[0] https://compilers.iecc.com/comparch/article/94-04-115
[1] http://re2c.org/news/release_notes/1_0.html
[2]
http://re2c.org/2017_trofimovich_tagged_deterministic_finite_automata_with_lookahead.pdf
[3] https://ftp.gnu.org/old-gnu/Manuals/flex-2.5.4/html_node/flex_23.html
[4] https://laurikari.net/ville/spire2000-tnfa.pdf
[5] https://wiki.haskell.org/index.php?title=Regular_expressions/Bounded_space_proposal&oldid=11475
Return to the
comp.compilers page.
Search the
comp.compilers archives again.