Related articles |
---|
[3 earlier articles] |
Re: re2c-1.0 released! anton@mips.complang.tuwien.ac.at (2017-09-02) |
Re: re2c-1.0 released! gneuner2@comcast.net (George Neuner) (2017-09-02) |
Re: re2c-1.0 released! skvadrik@gmail.com (Ulya Trofimovich) (2017-09-03) |
Re: re2c-1.0 released! jamin.hanson@googlemail.com (Ben Hanson) (2017-09-03) |
Re: re2c-1.0 released! jamin.hanson@googlemail.com (Ben Hanson) (2017-09-03) |
Re: re2c-1.0 released! 398-816-0742@kylheku.com (Kaz Kylheku) (2017-09-03) |
Re: re2c-1.0 released! skvadrik@gmail.com (Ulya Trofimovich) (2017-09-03) |
Re: re2c-1.0 released! skvadrik@gmail.com (Ulya Trofimovich) (2017-09-03) |
Re: re2c-1.0 released! jamin.hanson@googlemail.com (Ben Hanson) (2017-09-04) |
Re: re2c-1.0 released! skvadrik@gmail.com (Ulya Trofimovich) (2017-09-08) |
From: | Ulya Trofimovich <skvadrik@gmail.com> |
Newsgroups: | comp.compilers |
Date: | Sun, 3 Sep 2017 17:58:33 +0100 |
Organization: | Compilers Central |
References: | 17-08-007 17-09-001 17-09-003 |
Injection-Info: | gal.iecc.com; posting-host="news.iecc.com:2001:470:1f07:1126:0:676f:7373:6970"; logging-data="54656"; mail-complaints-to="abuse@iecc.com" |
Keywords: | lex |
Posted-Date: | 03 Sep 2017 13:43:46 EDT |
Content-Language: | en-US |
> In fact we can do it all in standard lex.
> [...]
> Pseudo-code, including a mechanism for returning to the INITIAL
> state without parser involvement:
> [...]
The thing is, the new algorithm is both more *efficient* and *elegant*.
Consider how parsing date in format 'YYYY-MM-DD' would look like in re2c:
@y [0-9]{4} [-] @m [0-9]{2} [-] @d [0-9]{2} {
unsigned year
= (y[0] - '0') * 1000
+ (y[1] - '0') * 100
+ (y[2] - '0') * 10
+ (y[3] - '0');
unsigned month
= (m[0] - '0') * 10
+ (m[1] - '0');
unsigned day
= (m[0] - '0') * 10
+ (m[1] - '0');
}
Here re2c will optimize out all dynamic variables, because @d is always
2 characters behind the end of input, and @m is 3 characters behind @d,
and @y is 5 characters behind @m. So all three variables will be
calculated using static offsets from cursor when it points to the end of
input.
If comments and whitespace are allowed between date components, we could
change the above example like this (now re2c will use variables to track
submatch positions, since components have variable length):
com = "/*" ([^*] | ("*" [^/]))* "*""/";
wsp = [ \t\n\r]* | com;
hyp = wsp [-] wsp;
@y [0-9]{4} hyp @m [0-9]{2} hyp @d [0-9]{2} {
unsigned year
= (y[0] - '0') * 1000
+ (y[1] - '0') * 100
+ (y[2] - '0') * 10
+ (y[3] - '0');
unsigned month
= (m[0] - '0') * 10
+ (m[1] - '0');
unsigned day
= (m[0] - '0') * 10
+ (m[1] - '0');
}
Return to the
comp.compilers page.
Search the
comp.compilers archives again.