Related articles |
---|
[2 earlier articles] |
Re: re2c-1.0 released! 398-816-0742@kylheku.com (Kaz Kylheku) (2017-09-02) |
Re: re2c-1.0 released! anton@mips.complang.tuwien.ac.at (2017-09-02) |
Re: re2c-1.0 released! gneuner2@comcast.net (George Neuner) (2017-09-02) |
Re: re2c-1.0 released! skvadrik@gmail.com (Ulya Trofimovich) (2017-09-03) |
Re: re2c-1.0 released! jamin.hanson@googlemail.com (Ben Hanson) (2017-09-03) |
Re: re2c-1.0 released! jamin.hanson@googlemail.com (Ben Hanson) (2017-09-03) |
Re: re2c-1.0 released! 398-816-0742@kylheku.com (Kaz Kylheku) (2017-09-03) |
Re: re2c-1.0 released! skvadrik@gmail.com (Ulya Trofimovich) (2017-09-03) |
Re: re2c-1.0 released! skvadrik@gmail.com (Ulya Trofimovich) (2017-09-03) |
Re: re2c-1.0 released! jamin.hanson@googlemail.com (Ben Hanson) (2017-09-04) |
Re: re2c-1.0 released! skvadrik@gmail.com (Ulya Trofimovich) (2017-09-08) |
From: | Kaz Kylheku <398-816-0742@kylheku.com> |
Newsgroups: | comp.compilers |
Date: | Sun, 3 Sep 2017 15:14:29 +0000 (UTC) |
Organization: | Aioe.org NNTP Server |
References: | 17-08-007 17-09-001 17-09-003 |
Injection-Info: | gal.iecc.com; posting-host="news.iecc.com:2001:470:1f07:1126:0:676f:7373:6970"; logging-data="40873"; mail-complaints-to="abuse@iecc.com" |
Keywords: | lex, design |
Posted-Date: | 03 Sep 2017 12:04:34 EDT |
On 2017-09-02, Anton Ertl <anton@mips.complang.tuwien.ac.at> wrote:
> Kaz Kylheku <398-816-0742@kylheku.com> writes:
>>Briefly, why would you do some hacky regex thing in lex with \1, \2,
>>\3, when in the level immediately above yylex() you have proper phrase
>>recognition, with $1, $2, $3.
>
> These don't do the same thing. Taking your example from the other
> posting, do you want to recognize
>
> 2017
> -08
> /* bla bla */ -
> 28
>
> as date? If so, do it at the parser level, if not, the scanner.
We face this choice only because we put a hack into the lexer:
when it scans whitespace token or comment token, it throws it away,
so the parser doesn't see it.
That hack is very easy to implement and considerably simplifies the
grammar.
We can very easily implement another hack to allow us to recognize
2017-08-28 as three tokens without comments or whitespace.
In fact we can do it all in standard lex.
We can set up a custom YYINPUT which allows the entire yytext to be
pushed back into the stream. The rest is done with orchestration of one
or more start states.
The steps to recognizing a date would then be:
1. First NNNN-NN-NN is scanned as a token.
2. push_back_string(yytext) is called
3. BEGIN(DATE) is invoked
4. The rule doesn't return so the lexer re-scans the pushed input
in the DATE state.
5. In the DATE state, an integer token is recognized; everything
else is an error. Dash tokens can be recognized and returned,
or consumed in the lexer.
6. Successful recognition of an integer token in the DATE state
returns not an INTEGER to the parser but a DINTEGER.
7. The parser's phrase structure for matching dates refers to DINTEGER
nonterminals and not INTEGER.
Pseudo-code, including a mechanism for returning to the INITIAL state
without parser involvement:
%x DATE
%%
<INITIAL,DATE>[0-9]+ {
yylval.value = str_to_int(yytext); /* our function */
return (YYSTATE == DATE) ? DINTEGER : INTEGER;
}
<INITIAL,DATE>- {
return '-';
}
[0-9]+-[0-9]+-[0-9]+ {
unput('!'); /* end signal */
unput_string(yytext); /* our function; works with our YYINPUT */
BEGIN(DATE);
}
<DATE>! { /* our end signal: include other states here */
BEGIN(INITIAL);
}
<DATE>. {
/* internal error: how did we get here? */
}
Return to the
comp.compilers page.
Search the
comp.compilers archives again.