Related articles |
---|
Parsing implicit operators with recursive descent? johann@myrkraverk.com (Johann 'Myrkraverk' Oskarsson) (2009-02-06) |
Re: Parsing implicit operators with recursive descent? mailbox@dmitry-kazakov.de (Dmitry A. Kazakov) (2009-02-07) |
Re: Parsing implicit operators with recursive descent? armelasselin@hotmail.com (Armel) (2009-02-07) |
Re: Parsing implicit operators with recursive descent? torbenm@pc-003.diku.dk (2009-02-09) |
Re: Parsing implicit operators with recursive descent? barry.j.kelly@gmail.com (Barry Kelly) (2009-02-12) |
Re: Parsing implicit operators with recursive descent? johann@myrkraverk.com (Johann 'Myrkraverk' Oskarsson) (2010-01-30) |
Re: Parsing implicit operators with recursive descent? kkylheku@gmail.com (Kaz Kylheku) (2010-02-01) |
From: | Kaz Kylheku <kkylheku@gmail.com> |
Newsgroups: | comp.compilers |
Date: | Mon, 1 Feb 2010 21:08:28 +0000 (UTC) |
Organization: | A noiseless patient Spider |
References: | 09-02-013 09-02-022 10-01-097 |
Keywords: | parse |
Posted-Date: | 01 Feb 2010 18:26:56 EST |
On 2010-01-30, Johann 'Myrkraverk' Oskarsson <johann@myrkraverk.com> wrote:
> A year ago, torbenm@pc-003.diku.dk (Torben Fgidius Mogensen) wrote:
>
>> "Johann 'Myrkraverk' Oskarsson" <johann@myrkraverk.com> writes:
>>
>>> Is it possible to parse implicit operators, like the regular
>>> expression concatenation operator, with a recursive descent parser?
>>
>> Yes, you can do it mostly by standard grammar-rewriting techniques.
>>
>> Let us take regular expressions as an example. We start with the
>> grammar:
>>
>>
>> R -> R | R
>> R -> R R
>> R -> R *
>> R -> a
>>
>> Note that the | above is a terminal. a stands for any alphabet
>> symbol.
The above grammar doesn't account for regular expressions
with empty branches;
abc|
i.e. (abc|): match abc, or empty string.
Simply adding a production R -> <epsilon> causes a problem.
Now R -> R * can derive R -> <epsilon> *. I.e. a * is
a R, all by itself!
What you can do in a recursive parser is recognize that certain lookaheads in
only certain situations generate an empty R.
whole_regex := regex
regex := branch | regex
| branch
/*
* Here is where your implicit catenation is represented:
* a branch is either one term, or a term followed by a branch.
* We allow a branch to be empty, which is crucial in regexes.
*/
branch := term branch
| term
| /* empty! */ /* thus, regex derives empty! */
term := primary
| primary * /* useless compound star X**... syntax is invalid */
primary | letter
| [ class ]
| ( regex ) /* ( ) is valid due to regex's empty derivation. */
So your recursive parser basically is something like this pseudocode:
function parse_whole_regex(tokstream)
{
parse_regex(tokstream);
if (lookahead(tokstream) != EOF)
syntax_error(tokstream, "stray token %t", lookahead(tokstream));
}
function parse_regex(tokstream)
{
parse_branch(tokstream);
while (lookahead(tokstream) == '|') {
consume(tokstream);
parse_branch(tokstream);
}
/* At this point lookahead is EOF (we parsed a whole regex)
* or the character ')' (we parsed a parenthesized regex).
*/
}
function parse_branch(tokstream)
{
loop {
case (lookahead(tokstream)) {
EOF, '|', ')' : break;
}
parse_term(tokstream);
}
}
function parse_term(tokstream)
{
parse_primary();
if (lookahead(tokstream) == '*')
consume(tokstream);
}
function parse_primary(tokstream)
{
case (lookahead(tokstream)) {
'*' : syntax_error("operator * without operand");
'(' : consume(tokstream);
parse_regex(tokstream);
if (lookahead(tokstream) != ')')
syntax_error("unbalanced parenthesis");
consume(tokstream);
'[' : parse_charclass(tokstream);
letter : parse_letter(tokstream)
default : syntax_error("symbol %t misused as a regex term",
lookahead(tokstream));
}
}
function parse_charclass(tokstream)
{
...
}
function parse_letter(tokstream)
{
...
}
Return to the
comp.compilers page.
Search the
comp.compilers archives again.