Mon, 09 Feb 2009 12:26:02 +0100

Related articles |
---|

Parsing implicit operators with recursive descent? johann@myrkraverk.com (Johann 'Myrkraverk' Oskarsson) (2009-02-06) |

Re: Parsing implicit operators with recursive descent? mailbox@dmitry-kazakov.de (Dmitry A. Kazakov) (2009-02-07) |

Re: Parsing implicit operators with recursive descent? armelasselin@hotmail.com (Armel) (2009-02-07) |

Re: Parsing implicit operators with recursive descent? torbenm@pc-003.diku.dk (2009-02-09) |

Re: Parsing implicit operators with recursive descent? barry.j.kelly@gmail.com (Barry Kelly) (2009-02-12) |

Re: Parsing implicit operators with recursive descent? johann@myrkraverk.com (Johann 'Myrkraverk' Oskarsson) (2010-01-30) |

Re: Parsing implicit operators with recursive descent? kkylheku@gmail.com (Kaz Kylheku) (2010-02-01) |

From: | torbenm@pc-003.diku.dk (Torben =?iso-8859-1?Q?=C6gidius?= Mogensen) |

Newsgroups: | comp.compilers |

Date: | Mon, 09 Feb 2009 12:26:02 +0100 |

Organization: | Department of Computer Science, University of Copenhagen |

References: | 09-02-013 |

Keywords: | parse, LL(1) |

Posted-Date: | 10 Feb 2009 09:46:58 EST |

"Johann 'Myrkraverk' Oskarsson" <johann@myrkraverk.com> writes:

*> Is it possible to parse implicit operators, like the regular*

*> expression concatenation operator, with a recursive descent parser?*

Yes, you can do it mostly by standard grammar-rewriting techniques.

Let us take regular expressions as an example. We start with the

grammar:

R -> R | R

R -> R R

R -> R *

R -> a

Note that the | above is a terminal. a stands for any alphabet symbol.

We start by eliminating ambiguity. We choose (as normal) | to bind

weaker than concatenation which binds weaker than Kleene-star. We

also choose | and concatenation to be right-associative. They are

semantically fully associative, so we could choose any. But we get

less left-recursion to eliminate later by choosing them to be

right-assciative.

R -> C | R

R -> C

C -> S C

C -> S

S -> S *

S -> a

We now do left factorisation and left-recursion elimination:

R -> C R'

R' -> | R

R' ->

C -> S C'

C' -> C

C' ->

S -> a S'

S' -> * S'

S' ->

To see if this is LL(1), we need to calculate FIRST and FOLLOW.

Below, I have for each production added the FIRST set:

R -> C R' {a}

R' -> | R {|}

R' -> {}

C -> S C' {a}

C' -> C {a}

C' -> {}

S -> a S' {a}

S' -> * S' {*}

S' -> {}

The FOLLOW sets are:

FOLLOW(R) = {$}

FOLLOW(R') = {$}

FOLLOW(C) = {$,|}

FOLLOW(C') = {$,|}

FOLLOW(S) = {$,|,a}

FOLLOW(S') = {$,|,a}

This leads to the following parse table:

| a * | $

--+------------------------------

R | R->CR'

R'| R'->|R R'->

C | C->SC'

C'| C'->C C'-> C'->

S | S->aS'

S'| S'-> S'->*S' S'-> S'->

which has no conflicts, so the grammar is LL(1).

The parse table can easily be rewritten to a recursive descent parser.

Torben

Post a followup to this message

Return to the
comp.compilers page.

Search the
comp.compilers archives again.