Re: Parsing implicit operators with recursive descent?

torbenm@pc-003.diku.dk (Torben =?iso-8859-1?Q?=C6gidius?= Mogensen)
Mon, 09 Feb 2009 12:26:02 +0100

          From comp.compilers

Related articles
Parsing implicit operators with recursive descent? johann@myrkraverk.com (Johann 'Myrkraverk' Oskarsson) (2009-02-06)
Re: Parsing implicit operators with recursive descent? mailbox@dmitry-kazakov.de (Dmitry A. Kazakov) (2009-02-07)
Re: Parsing implicit operators with recursive descent? armelasselin@hotmail.com (Armel) (2009-02-07)
Re: Parsing implicit operators with recursive descent? torbenm@pc-003.diku.dk (2009-02-09)
Re: Parsing implicit operators with recursive descent? barry.j.kelly@gmail.com (Barry Kelly) (2009-02-12)
Re: Parsing implicit operators with recursive descent? johann@myrkraverk.com (Johann 'Myrkraverk' Oskarsson) (2010-01-30)
Re: Parsing implicit operators with recursive descent? kkylheku@gmail.com (Kaz Kylheku) (2010-02-01)
| List of all articles for this month |
From: torbenm@pc-003.diku.dk (Torben =?iso-8859-1?Q?=C6gidius?= Mogensen)
Newsgroups: comp.compilers
Date: Mon, 09 Feb 2009 12:26:02 +0100
Organization: Department of Computer Science, University of Copenhagen
References: 09-02-013
Keywords: parse, LL(1)
Posted-Date: 10 Feb 2009 09:46:58 EST

"Johann 'Myrkraverk' Oskarsson" <johann@myrkraverk.com> writes:




> Is it possible to parse implicit operators, like the regular
> expression concatenation operator, with a recursive descent parser?


Yes, you can do it mostly by standard grammar-rewriting techniques.


Let us take regular expressions as an example. We start with the
grammar:




    R -> R | R
    R -> R R
    R -> R *
    R -> a


Note that the | above is a terminal. a stands for any alphabet symbol.


We start by eliminating ambiguity. We choose (as normal) | to bind
weaker than concatenation which binds weaker than Kleene-star. We
also choose | and concatenation to be right-associative. They are
semantically fully associative, so we could choose any. But we get
less left-recursion to eliminate later by choosing them to be
right-assciative.


    R -> C | R
    R -> C
    C -> S C
    C -> S
    S -> S *
    S -> a


We now do left factorisation and left-recursion elimination:


    R -> C R'
    R' -> | R
    R' ->
    C -> S C'
    C' -> C
    C' ->
    S -> a S'
    S' -> * S'
    S' ->


To see if this is LL(1), we need to calculate FIRST and FOLLOW.
Below, I have for each production added the FIRST set:




    R -> C R' {a}
    R' -> | R {|}
    R' -> {}
    C -> S C' {a}
    C' -> C {a}
    C' -> {}
    S -> a S' {a}
    S' -> * S' {*}
    S' -> {}


The FOLLOW sets are:


  FOLLOW(R) = {$}
  FOLLOW(R') = {$}
  FOLLOW(C) = {$,|}
  FOLLOW(C') = {$,|}
  FOLLOW(S) = {$,|,a}
  FOLLOW(S') = {$,|,a}


This leads to the following parse table:


      | a * | $
  --+------------------------------
  R | R->CR'
  R'| R'->|R R'->
  C | C->SC'
  C'| C'->C C'-> C'->
  S | S->aS'
  S'| S'-> S'->*S' S'-> S'->


which has no conflicts, so the grammar is LL(1).


The parse table can easily be rewritten to a recursive descent parser.


Torben



Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.