Re: Regular expressions in lexing and parsing

Quinn Jackson <quinn.jackson@ieee.org>
Tue, 18 Jun 2019 12:57:39 -0700

From comp.compilers

Related articles
Regular expressions in lexing and parsing ed_davis2@yahoo.com.dmarc.email (Ed Davis) (2019-05-17)
Regular expressions in lexing and parsing jamin.hanson@googlemail.com (Ben Hanson) (2019-05-18)
Re: Regular expressions in lexing and parsing DrDiettrich1@netscape.net (Hans-Peter Diettrich) (2019-05-21)
Re: Regular expressions in lexing and parsing drikosev@gmail.com (Ev. Drikos) (2019-05-23)
Re: Regular expressions in lexing and parsing christopher.f.clark@compiler-resources.com (Christopher F Clark) (2019-06-17)
Re: Regular expressions in lexing and parsing quinn.jackson@ieee.org (Quinn Jackson) (2019-06-18)
*Re: Regular expressions in lexing and parsing quinn.jackson@ieee.org (Quinn Jackson)* (2019-06-18)**
Re: Regular expressions in lexing and parsing 847-115-0292@kylheku.com (Kaz Kylheku) (2019-06-18)
Re: Regular expressions in lexing and parsing christopher.f.clark@compiler-resources.com (Christopher F Clark) (2019-06-18)

| List of all articles for this month |

From:	Quinn Jackson <quinn.jackson@ieee.org>
Newsgroups:	comp.compilers
Date:	Tue, 18 Jun 2019 12:57:39 -0700
Organization:	Compilers Central
References:	19-06-005 19-06-008
Injection-Info:	gal.iecc.com; posting-host="news.iecc.com:2001:470:1f07:1126:0:676f:7373:6970"; logging-data="15907"; mail-complaints-to="abuse@iecc.com"
Keywords:	lex, parse
Posted-Date:	18 Jun 2019 16:17:36 EDT
In-Reply-To:	19-06-008

On Tue, Jun 18, 2019 at 12:23 PM John wrote:
> [I agree with the sentiment to use formal grammars for both lex and
> parse, but it always has seemed to me that running them separately as
> coroutines makes it easier to deal with comments and whitespace
> without having to hang them on every token definition. I realize that
> the more context sensitive the language, the more it is likely to make
> sense to combine them. -John]

The Meta-S system has a way to do that automatically. The "whitespace
rule" (__ws is the reserved name for it) also has two alias sequences
that make it simple to specify optional versus required whitespace (#@
and ## respectively):

some_rule ## some_other_rule // whitespace is not optional

some_rule #? "(" #? some_other_rule #? ")" // whitespace in all
positions is optional

Moreover, the definition of the __ws can be:

__ws ::= #notree #standard_whitespace; // defined as standard C type
whitespace, and don't stick it on the AST when recognized in the input

I have found that, while sometimes verbose, this explicit handling
makes the dynamic nature of the grammars far easier to implement at
the back-end. (At the cost of the human grammar author having to think
about whitespace more often than some would prefer.)

If I recall correctly, it was Hans-Peter Dietrich (the OP) who
convinced me a number of years back to include the ability to specify
certain tokens that did not need explicit whitespace around them in
the productions that use them. (The grammar compiler itself then takes
responsibility for inserting the appropriate whitespace operators
behind the scenes during compilation when those tokens are used.)

In the long run, this hybrid approach has offered the most
versatility, without being too cumbersome. (There are bigger dragons
to fight with adaptive systems that can modify their rules in
mid-parse.....)

--
Quinn Jackson CSci MIScT SMACM SMIEEE FRSA

LinkedIn: http://ca.linkedin.com/in/quinnjackson/
ResearchGate: http://researchgate.net/profile/Quinn_Jackson/

Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.

Re: Regular expressions in lexing and parsing

Quinn Jackson <quinn.jackson@ieee.org>Tue, 18 Jun 2019 12:57:39 -0700

Quinn Jackson <quinn.jackson@ieee.org>
Tue, 18 Jun 2019 12:57:39 -0700