# Re: Parsing implicit operators with recursive descent?

## torbenm@pc-003.diku.dk (Torben =?iso-8859-1?Q?=C6gidius?= Mogensen)Mon, 09 Feb 2009 12:26:02 +0100

From comp.compilers

Related articles
Parsing implicit operators with recursive descent? johann@myrkraverk.com (Johann 'Myrkraverk' Oskarsson) (2009-02-06)
Re: Parsing implicit operators with recursive descent? mailbox@dmitry-kazakov.de (Dmitry A. Kazakov) (2009-02-07)
Re: Parsing implicit operators with recursive descent? armelasselin@hotmail.com (Armel) (2009-02-07)
Re: Parsing implicit operators with recursive descent? torbenm@pc-003.diku.dk (2009-02-09)
Re: Parsing implicit operators with recursive descent? barry.j.kelly@gmail.com (Barry Kelly) (2009-02-12)
Re: Parsing implicit operators with recursive descent? johann@myrkraverk.com (Johann 'Myrkraverk' Oskarsson) (2010-01-30)
Re: Parsing implicit operators with recursive descent? kkylheku@gmail.com (Kaz Kylheku) (2010-02-01)
| List of all articles for this month |

 From: torbenm@pc-003.diku.dk (Torben =?iso-8859-1?Q?=C6gidius?= Mogensen) Newsgroups: comp.compilers Date: Mon, 09 Feb 2009 12:26:02 +0100 Organization: Department of Computer Science, University of Copenhagen References: 09-02-013 Keywords: parse, LL(1) Posted-Date: 10 Feb 2009 09:46:58 EST

> Is it possible to parse implicit operators, like the regular
> expression concatenation operator, with a recursive descent parser?

Yes, you can do it mostly by standard grammar-rewriting techniques.

Let us take regular expressions as an example. We start with the
grammar:

R -> R | R
R -> R R
R -> R *
R -> a

Note that the | above is a terminal. a stands for any alphabet symbol.

We start by eliminating ambiguity. We choose (as normal) | to bind
weaker than concatenation which binds weaker than Kleene-star. We
also choose | and concatenation to be right-associative. They are
semantically fully associative, so we could choose any. But we get
less left-recursion to eliminate later by choosing them to be
right-assciative.

R -> C | R
R -> C
C -> S C
C -> S
S -> S *
S -> a

We now do left factorisation and left-recursion elimination:

R -> C R'
R' -> | R
R' ->
C -> S C'
C' -> C
C' ->
S -> a S'
S' -> * S'
S' ->

To see if this is LL(1), we need to calculate FIRST and FOLLOW.
Below, I have for each production added the FIRST set:

R -> C R' {a}
R' -> | R {|}
R' -> {}
C -> S C' {a}
C' -> C {a}
C' -> {}
S -> a S' {a}
S' -> * S' {*}
S' -> {}

FOLLOW(R) = {\$}
FOLLOW(R') = {\$}
FOLLOW(C) = {\$,|}
FOLLOW(C') = {\$,|}
FOLLOW(S) = {\$,|,a}
FOLLOW(S') = {\$,|,a}

This leads to the following parse table:

| a * | \$
--+------------------------------
R | R->CR'
R'| R'->|R R'->
C | C->SC'
C'| C'->C C'-> C'->
S | S->aS'
S'| S'-> S'->*S' S'-> S'->

which has no conflicts, so the grammar is LL(1).

The parse table can easily be rewritten to a recursive descent parser.

Torben

Post a followup to this message