Re: Some questions about recursive descent?

Alain Ketterlin <alain@universite-de-strasbourg.fr>
Tue, 01 Mar 2022 23:13:25 +0100

          From comp.compilers

Related articles
Some questions about recursive descent? johann@myrkraverk.com (Johann 'Myrkraverk' Oskarsson) (2022-02-27)
Re: Some questions about recursive descent? gah4@u.washington.edu (gah4) (2022-02-27)
Re: Some questions about recursive descent? DrDiettrich1@netscape.net (Hans-Peter Diettrich) (2022-02-28)
Re: Some questions about recursive descent? anton@mips.complang.tuwien.ac.at (2022-02-28)
Re: Some questions about recursive descent? alain@universite-de-strasbourg.fr.invalid (Alain Ketterlin) (2022-02-28)
Re: Some questions about recursive descent? johann@myrkraverk.invalid (Johann 'Myrkraverk' Oskarsson) (2022-03-01)
Re: Some questions about recursive descent? anton@mips.complang.tuwien.ac.at (2022-03-01)
Re: Some questions about recursive descent? alain@universite-de-strasbourg.fr (Alain Ketterlin) (2022-03-01)
Re: Some questions about recursive descent? DrDiettrich1@netscape.net (Hans-Peter Diettrich) (2022-03-02)
Re:Some questions about recursive descent? xdpascal@xdpascal.com (Paul Robinson) (2022-03-05)
| List of all articles for this month |

From: Alain Ketterlin <alain@universite-de-strasbourg.fr>
Newsgroups: comp.compilers
Date: Tue, 01 Mar 2022 23:13:25 +0100
Organization: =?utf-8?Q?Universit=C3=A9?= de Strasbourg
References: 22-02-021 22-02-024 22-02-026 22-02-027
Injection-Info: gal.iecc.com; posting-host="news.iecc.com:2001:470:1f07:1126:0:676f:7373:6970"; logging-data="42163"; mail-complaints-to="abuse@iecc.com"
Keywords: parse, LL(1)
Posted-Date: 01 Mar 2022 17:40:05 EST

Johann 'Myrkraverk' Oskarsson <johann@myrkraverk.invalid> writes:


> Johann 'Myrkraverk' Oskarsson <johann@myrkraverk.com> writes:
>>>> My first question is: how are production recursive descent parsers con-
>>>> structed? From these two books, and other things I have read in the
>>>> past, recursive descent is so simple it's mostly just brushed off with
>>>> an example or two, but there seems to be a whole lot more to it. For
>>>> instance, [Appel] mentions a parse table but mostly does not explain how
>>>> to construct one. ...


[...]
> The demonstration starts with grammar 3.12, page 47.
>
> Z -> d Y -> X -> Y
> Z -> X Y Z Y -> c X -> a
>
> where Y goes to epsilon in the centre of the first line.
> Then [Appel] shows how FIRST and FOLLOW sets are computed
> given this grammar, ending with
>
> nullable FIRST FOLLOW
> X yes a c a c d
> Y yes c a c d
> Z no a c d
>
> where previous steps are also shown[1]. Then figure 3.14 shows a
> predictive parsing table, presumably constructed using the FIRST
> and FOLLOW sets.
>
> a c d
> +-----------------------------------------+
> X | X -> a X -> Y X -> Y |
> | Y -> Y |
> | |
> Y | Y -> Y -> Y -> |
> | Y -> c |
> | |
> Z | Z -> X Y Z Z -> X Y Z Z -> d |
> | Z -> X Y Z |
> +-----------------------------------------+
>
> Where {X,a}, {Y,c}, and {Z,d} show conflicts, so this cannot be a
> recursive descent parser; or LL(1) grammar per the book.
>
> I am not clear on how to construct such a table from the description
> in the book, and would not presume to be able to do so by hand, using
> a more complex grammar.


That's a 2-dimensional table M[N,t] where N is a non-terminal, and t
is a terminal symbol. It is build by the following algorithm:


for each rule L -> R1...Rk
        for each terminal f in FIRST (R1...Rk)
                add the rule to M[L,f]
        if R1...Rk is nullable
                for each terminal f in FOLLOW (L)
                        add the rule to M[L,f]


In essence: FIRST lets you know when to expand a non-terminal, and
FOLLOW lets you know when to apply a rule that matches nothing in
the input.


The grammar is LL(1) is all entries of the table contain at most one
rule. That is not the case in your example.


Parsing goes as follows: maintain a stack on which you initially push
the start symbol. Then repeat:


if top(stack) is a terminal symbol
        if it matches the next input
                pop it from the stack and advance input
        else
                signal error
else
        if M[top(stack),next_input] is empty
                signal error
        else // M[...] is a rule L -> R1...Rk
                pop an element from the stack and push Rk, ..., R1


(note that you push the rhs in reverse order, such as the leftmost
symbol appears on the top of the stack).


The parsing succeeds if you end up with an empty stack and an empty
input.


I don't have my copy of the dragon book with me, but from pearson's web
site I find this is the object of chapter 4 section 4.


> So I was wondering if people who create /recursive descent/ parsers for
> production compilers use such a table as an intermediary step, or not.
[...]


As John says, if your language fits the requirements (that is, it has a
LL(1) grammar, or LR(1)), you are better off using a generator. It turns
out that "big" compiler parsers are often written by hand, because 1)
grammars are often ambiguous (sometimes at the lexical level), 2)
ambiguities are easier to resolve "by hand" because you can take more
context into account, and 3) hand-written parsers give you more control
on error recovery. (Also because memory is now big enough to hold a
parse tree.)


For C/C++ parsers, add the fact that there are several grammatical
layers (e.g., OpenMP pragmas), and/or you need details that automated
parsing usually omits (e.g., for diagnostics and tools). That's why both
gcc and clang parsers are hand-written.


The official Python implementation used an LL(1) (auto-generated) parser
until recently, where they switched to a PEG-parser (I have to admit I
don't understand their motivation for doing this -- as exposed in a
document called PEP 617). It is still auto-generated as far as I
understand.


-- Alain.


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.