Simple Lexer and Simple Parser [ was RE: Flex is the most powerful lexical analysis language in the world. True or False? ]

Roger L Costello <costello@mitre.org>
Sun, 8 May 2022 13:34:03 +0000

          From comp.compilers

Related articles
| List of all articles for this month |

From: Roger L Costello <costello@mitre.org>
Newsgroups: comp.compilers
Date: Sun, 8 May 2022 13:34:03 +0000
Organization: Compilers Central
References: 22-05-003 22-05-007 22-05-009 22-05-018
Injection-Info: gal.iecc.com; posting-host="news.iecc.com:2001:470:1f07:1126:0:676f:7373:6970"; logging-data="38724"; mail-complaints-to="abuse@iecc.com"
Keywords: lex
Posted-Date: 08 May 2022 14:45:06 EDT
Content-Language: en-US

Thank you again Chris. Terrific information.


Another question if I may. You wrote:


> And that goes to an important point. Your lexer *should be* almost
> trivially simple (i.e. regular expressions only and not complicated
> ones). You rarely want to solve problems at the lexical level. You
> are much less likely to get good error reporting if you do. In most
> cases, your parser should be simple also.


For a while now I have been (for fun) working on building a parser for
parsing XML documents. I have experimented with making the lexer
simple and with making the parser simple. If I make the lexer simple,
then the parser is complex. If I make the lexer complex (using lots of
states and making heavy use of Flex's pushdown stack) then the parser
is simple. It doesn't seem possible to make both the lexer and parser
simple.


There are lots of "conditional rules" in XML. For example, in XML the
&amp; is called an "XML entity." Since the & is a reserved symbol, XML
documents need to use &amp; instead of &. An XML parser is to convert
&amp; to &. However, if the &amp; is in certain contexts -- within a
comment or within a CDATA section -- then the &amp; is not converted.
Thus, there is conditional processing:


IF (&amp; is in a comment or in a CDATA section) THEN
        OUTPUT(&amp;)
ELSE
      OUTPUT(&)


Flex's states/stack mechanism is ideally suited for conditional
processing like this. From the section on Start Conditions in the Flex
manual: "flex provides a mechanism for conditionally activating
rules."


So while it would be great to have a simple lexer, I am leaning
towards dealing with the conditional rules in XML using the Flex
states/stack mechanism rather than dealing with the conditional rules
in Bison. In other words, I am leaning towards a complex lexer.


I am interested in hearing your thoughts on this.


> You don't need a flamethrower


My apologies. It wasn't my intent to throw a flame. But in hindsight I
can see that I should have worded things much better. I will do better
in the future.


/Roger


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.