Re: re2c-1.0 released!

Ulya Trofimovich <skvadrik@gmail.com>
Sun, 3 Sep 2017 17:58:33 +0100

          From comp.compilers

Related articles
[3 earlier articles]
Re: re2c-1.0 released! anton@mips.complang.tuwien.ac.at (2017-09-02)
Re: re2c-1.0 released! gneuner2@comcast.net (George Neuner) (2017-09-02)
Re: re2c-1.0 released! skvadrik@gmail.com (Ulya Trofimovich) (2017-09-03)
Re: re2c-1.0 released! jamin.hanson@googlemail.com (Ben Hanson) (2017-09-03)
Re: re2c-1.0 released! jamin.hanson@googlemail.com (Ben Hanson) (2017-09-03)
Re: re2c-1.0 released! 398-816-0742@kylheku.com (Kaz Kylheku) (2017-09-03)
Re: re2c-1.0 released! skvadrik@gmail.com (Ulya Trofimovich) (2017-09-03)
Re: re2c-1.0 released! skvadrik@gmail.com (Ulya Trofimovich) (2017-09-03)
Re: re2c-1.0 released! jamin.hanson@googlemail.com (Ben Hanson) (2017-09-04)
Re: re2c-1.0 released! skvadrik@gmail.com (Ulya Trofimovich) (2017-09-08)
| List of all articles for this month |

From: Ulya Trofimovich <skvadrik@gmail.com>
Newsgroups: comp.compilers
Date: Sun, 3 Sep 2017 17:58:33 +0100
Organization: Compilers Central
References: 17-08-007 17-09-001 17-09-003
Keywords: lex
Content-Language: en-US

> In fact we can do it all in standard lex.
> [...]
> Pseudo-code, including a mechanism for returning to the INITIAL
> state without parser involvement:
> [...]


The thing is, the new algorithm is both more *efficient* and *elegant*.
Consider how parsing date in format 'YYYY-MM-DD' would look like in re2c:




@y [0-9]{4} [-] @m [0-9]{2} [-] @d [0-9]{2} {
        unsigned year
                = (y[0] - '0') * 1000
                + (y[1] - '0') * 100
                + (y[2] - '0') * 10
                + (y[3] - '0');
        unsigned month
                = (m[0] - '0') * 10
                + (m[1] - '0');
        unsigned day
                = (m[0] - '0') * 10
                + (m[1] - '0');
}




Here re2c will optimize out all dynamic variables, because @d is always
2 characters behind the end of input, and @m is 3 characters behind @d,
and @y is 5 characters behind @m. So all three variables will be
calculated using static offsets from cursor when it points to the end of
input.




If comments and whitespace are allowed between date components, we could
change the above example like this (now re2c will use variables to track
submatch positions, since components have variable length):




com = "/*" ([^*] | ("*" [^/]))* "*""/";
wsp = [ \t\n\r]* | com;
hyp = wsp [-] wsp;


@y [0-9]{4} hyp @m [0-9]{2} hyp @d [0-9]{2} {
        unsigned year
                = (y[0] - '0') * 1000
                + (y[1] - '0') * 100
                + (y[2] - '0') * 10
                + (y[3] - '0');
        unsigned month
                = (m[0] - '0') * 10
                + (m[1] - '0');
        unsigned day
                = (m[0] - '0') * 10
                + (m[1] - '0');
}


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.