Regular expressions in lexing and parsing

Ben Hanson <jamin.hanson@googlemail.com>
Sat, 18 May 2019 13:24:40 +0100

          From comp.compilers

Related articles
Regular expressions in lexing and parsing ed_davis2@yahoo.com.dmarc.email (Ed Davis) (2019-05-17)
Regular expressions in lexing and parsing jamin.hanson@googlemail.com (Ben Hanson) (2019-05-18)
Re: Regular expressions in lexing and parsing DrDiettrich1@netscape.net (Hans-Peter Diettrich) (2019-05-21)
Re: Regular expressions in lexing and parsing drikosev@gmail.com (Ev. Drikos) (2019-05-23)
Re: Regular expressions in lexing and parsing christopher.f.clark@compiler-resources.com (Christopher F Clark) (2019-06-17)
Re: Regular expressions in lexing and parsing quinn.jackson@ieee.org (Quinn Jackson) (2019-06-18)
Re: Regular expressions in lexing and parsing quinn.jackson@ieee.org (Quinn Jackson) (2019-06-18)
Re: Regular expressions in lexing and parsing 847-115-0292@kylheku.com (Kaz Kylheku) (2019-06-18)
[1 later articles]
| List of all articles for this month |
From: Ben Hanson <jamin.hanson@googlemail.com>
Newsgroups: comp.compilers
Date: Sat, 18 May 2019 13:24:40 +0100
Organization: Compilers Central
References: 19-05-092
Injection-Info: gal.iecc.com; posting-host="news.iecc.com:2001:470:1f07:1126:0:676f:7373:6970"; logging-data="14889"; mail-complaints-to="abuse@iecc.com"
Keywords: parse, lex, DFA, comment
Posted-Date: 20 May 2019 17:55:59 EDT
Content-Language: en-US

  >[Orignally here
https://commandcenter.blogspot.com/2011/08/regular-expressions-in-lexing-and.html


  >It took me a minute to figure out what he was saying, since the patterns in a lexer are regular
  >expressions. I believe the point is not to use general purpose regex libraries but rather to
  >use something like flex or re2c which will take a set of expressions and actions and
  >precompile them. -John]


I have my doubts due to:


  >Consider finding
  >alphanumeric identifiers. It's not too hard to write the regexp (something
  >like "[a-ZA-Z_][a-ZA-Z_0-9]*"), but really not much harder to write as a
  >simple loop. The performance of the loop, though, will be much higher and will
  >involve much less code under the covers.


and


  >And when we want to adjust our lexer to admit other character
  >types, such as Unicode identifiers, and handle normalization, and so on, the
  >hand-written loop can cope easily but the regexp approach will break down.


It doesn't help that flex *still* doesn't support Unicode, but according
to http://re2c.org/manual/features/encodings/encodings.html re2c does.


My view is that just as regex libraries have been embraced by most
languages (even C++ has std::regex now), the same should happen for
lexer and parser generators. This is the approach I have taken with
lexertl and parsertl. Hana Dusíková is doing interesting work in the
compile time regex space with
https://github.com/hanickadot/compile-time-regular-expressions .
Apparently she is working on a DFA version this year which will be
presented at cppnow - http://cppnow.org/history/2019/schedule/ . I'm
hoping that this new version can be used as a lexer generator and then I
can switch to it for a lot of my lexing needs.


Regards,


Ben
[I agree that once you have some way to embed precompiled DFAs in your
code, it doesn't much matter whether the rest is pattern-action like
in flex or something else. -John]


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.