RE: Flex is the most powerful lexical analysis language in the world. True or False?

Christopher F Clark <christopher.f.clark@compiler-resources.com>
Sat, 7 May 2022 13:15:58 +0300

          From comp.compilers

Related articles
| List of all articles for this month |

From: Christopher F Clark <christopher.f.clark@compiler-resources.com>
Newsgroups: comp.compilers
Date: Sat, 7 May 2022 13:15:58 +0300
Organization: Compilers Central
References: 22-05-003 22-05-007 22-05-009
Injection-Info: gal.iecc.com; posting-host="news.iecc.com:2001:470:1f07:1126:0:676f:7373:6970"; logging-data="29410"; mail-complaints-to="abuse@iecc.com"
Keywords: lex
Posted-Date: 07 May 2022 18:10:06 EDT

Roger, since you asked, I will answer what solution I would reach for
(and give advice on what I think others should reach for).


First, John's advice is straight on. You don't need a flamethrower in
most cases, and in fact having one invites one to abuse it. I think
Purdue has a series of videos on the fastest way to light a barbeque
which end with completely melting the bbq in a matter of seconds using
liquid oxygen. Fast, but probably the wrong solution for grilling
burgers and hot dogs.


Next, if you already have a lexer, I would NOT change the technology
using it, at least not by much, unless one had a specific reason to do
otherwise. I might switch from original LEX to Flex (and original
yacc to Bison), but that would be one of the few exceptions to rule.
Now, if I had issues I would switch, but I would first attempt to see
if there are workarounds.


in my last three projects, there was already a lexer-parser
combinartion in use. So, in my last three projects (and four
implementations) we used: Bison + Flex, ANTLR, Parser-RS, and JAVACC.
In the last two, I don't even know what lexer was used as I never
touched it other than to add keywords.


And that goes to an important point. Your lexer *should be* almost
trivially simple (i.e. regular expressions only and not complicated
ones). You rarely want to solve problems at the lexical level. You
are much less likely to get good error reporting if you do. In most
cases, your parser should be simple also. You might want LR parsing
for expressions, but otherwise you want your grammar to be LL(1) (with
the if-then-else hack).


And, all else being equal, if you don't have a lexer-parser
combination you can reuse. I would pick somewhat based upon
programming language, since most tools are relatively tied to one
language even when they support more than one.


I haven't decided on my favorite for Rust yet, parser-RS isn't bad,
Nom is also popular.


For Java, I would go with ANTLR4. And, overall, I would say that is
my current favorite despite a few nits.


For C++ or C#, I would use the Yacc++ we wrote even though it needs
some tweaking to catch up to ANTLR at this point. I prefer our
solution to keywords to what ANTLR has and indirect left recursion
(i.e. parsing expressions with lists of expressions as a first
argument) doesn't work right in ANTLR.


If someone else was paying for it, I would investigate the DMS Toolkit
for Semantic Designs, because they have done most of the work to make
GLR practical.


If I really wanted to solve types in my grammar, I would look into
"meta-ess" by Jackson. I don't know how available that is.


And, if I were using Scheme or Lisp, I would look into Racket.


--
******************************************************************************
Chris Clark email: christopher.f.clark@compiler-resources.com
Compiler Resources, Inc. Web Site: http://world.std.com/~compres
23 Bailey Rd voice: (508) 435-5016
Berlin, MA 01503 USA twitter: @intel_chris
------------------------------------------------------------------------------


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.