Re: How make multifinished DFA for merged regexps?

Christopher F Clark <christopher.f.clark@compiler-resources.com>
Tue, 24 Dec 2019 04:42:58 -0500

          From comp.compilers

Related articles
[2 earlier articles]
How make multifinished DFA for merged regexps? christopher.f.clark@compiler-resources.com (Christopher F Clark) (2019-12-20)
Re: How make multifinished DFA for merged regexps? borucki.andrzej@gmail.com (Andy) (2019-12-20)
Re: How make multifinished DFA for merged regexps? 493-878-3164@kylheku.com (Kaz Kylheku) (2019-12-21)
How make multifinished DFA for merged regexps? christopher.f.clark@compiler-resources.com (Christopher F Clark) (2019-12-23)
Re: How make multifinished DFA for merged regexps? DrDiettrich1@netscape.net (Hans-Peter Diettrich) (2019-12-24)
Re: How make multifinished DFA for merged regexps? matt.timmermans@gmail.com (Matt Timmermans) (2019-12-23)
Re: How make multifinished DFA for merged regexps? christopher.f.clark@compiler-resources.com (Christopher F Clark) (2019-12-24)
Re: How make multifinished DFA for merged regexps? rockbrentwood@gmail.com (2019-12-29)
| List of all articles for this month |

From: Christopher F Clark <christopher.f.clark@compiler-resources.com>
Newsgroups: comp.compilers
Date: Tue, 24 Dec 2019 04:42:58 -0500
Organization: Compilers Central
Injection-Info: gal.iecc.com; posting-host="news.iecc.com:2001:470:1f07:1126:0:676f:7373:6970"; logging-data="26936"; mail-complaints-to="abuse@iecc.com"
Keywords: lex, design
Posted-Date: 25 Dec 2019 21:23:09 EST

Hans-Peter Diettrich <DrDiettrich1@netscape.net> wrote:


> Why should "123." not form a valid float number? In fact it's the C way
> to force a possibly int number into a float.


This is actually a good point. If you are defining a language (rather
than simply implementing a standard language that is already well
specified), your tokens should not generally have subsets that are
errors. Thus, if 123.0 is a float, then 1 12 123 and 123. should all
be legal tokens, or if they are errors specific "error" tokens as in
"123." -> errorMissingDigitsAfterDot


If you do that, you can have only 1 character lookahead and no
complicated backtracking.


So, after you generate your lexer, you should look at all error states
(or error transitions) depending on how your FSA implements errors and
determine whether you should change your token definitions to cover
those cases, either by generalizing some token definition so that the
error is legal or defining an error token to cover that case or as I
suggested previously, make a rule that matches that case to two (or
more) tokens. There may be cases where you do nothing and just leave
the FSA as is, but you should do so consciously, by making choices.


--
******************************************************************************
Chris Clark email: christopher.f.clark@compiler-resources.com
Compiler Resources, Inc. Web Site: http://world.std.com/~compres
23 Bailey Rd voice: (508) 435-5016
Berlin, MA 01503 USA twitter: @intel_chris


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.