Related articles |
---|
Regular expressions in lexing and parsing ed_davis2@yahoo.com.dmarc.email (Ed Davis) (2019-05-17) |
Regular expressions in lexing and parsing jamin.hanson@googlemail.com (Ben Hanson) (2019-05-18) |
Re: Regular expressions in lexing and parsing DrDiettrich1@netscape.net (Hans-Peter Diettrich) (2019-05-21) |
Re: Regular expressions in lexing and parsing drikosev@gmail.com (Ev. Drikos) (2019-05-23) |
Re: Regular expressions in lexing and parsing christopher.f.clark@compiler-resources.com (Christopher F Clark) (2019-06-17) |
Re: Regular expressions in lexing and parsing quinn.jackson@ieee.org (Quinn Jackson) (2019-06-18) |
Re: Regular expressions in lexing and parsing quinn.jackson@ieee.org (Quinn Jackson) (2019-06-18) |
Re: Regular expressions in lexing and parsing 847-115-0292@kylheku.com (Kaz Kylheku) (2019-06-18) |
[1 later articles] |
From: | Ben Hanson <jamin.hanson@googlemail.com> |
Newsgroups: | comp.compilers |
Date: | Sat, 18 May 2019 13:24:40 +0100 |
Organization: | Compilers Central |
References: | 19-05-092 |
Injection-Info: | gal.iecc.com; posting-host="news.iecc.com:2001:470:1f07:1126:0:676f:7373:6970"; logging-data="14889"; mail-complaints-to="abuse@iecc.com" |
Keywords: | parse, lex, DFA, comment |
Posted-Date: | 20 May 2019 17:55:59 EDT |
Content-Language: | en-US |
>[Orignally here
https://commandcenter.blogspot.com/2011/08/regular-expressions-in-lexing-and.html
>It took me a minute to figure out what he was saying, since the patterns in a lexer are regular
>expressions. I believe the point is not to use general purpose regex libraries but rather to
>use something like flex or re2c which will take a set of expressions and actions and
>precompile them. -John]
I have my doubts due to:
>Consider finding
>alphanumeric identifiers. It's not too hard to write the regexp (something
>like "[a-ZA-Z_][a-ZA-Z_0-9]*"), but really not much harder to write as a
>simple loop. The performance of the loop, though, will be much higher and will
>involve much less code under the covers.
and
>And when we want to adjust our lexer to admit other character
>types, such as Unicode identifiers, and handle normalization, and so on, the
>hand-written loop can cope easily but the regexp approach will break down.
It doesn't help that flex *still* doesn't support Unicode, but according
to http://re2c.org/manual/features/encodings/encodings.html re2c does.
My view is that just as regex libraries have been embraced by most
languages (even C++ has std::regex now), the same should happen for
lexer and parser generators. This is the approach I have taken with
lexertl and parsertl. Hana Dusíková is doing interesting work in the
compile time regex space with
https://github.com/hanickadot/compile-time-regular-expressions .
Apparently she is working on a DFA version this year which will be
presented at cppnow - http://cppnow.org/history/2019/schedule/ . I'm
hoping that this new version can be used as a lexer generator and then I
can switch to it for a lot of my lexing needs.
Regards,
Ben
[I agree that once you have some way to embed precompiled DFAs in your
code, it doesn't much matter whether the rest is pattern-action like
in flex or something else. -John]
Return to the
comp.compilers page.
Search the
comp.compilers archives again.