Re: Grammar analysis

"Lasse =?ISO-8859-1?Q?Hiller=F8e?= Petersen" <lhp+news@toft-hp.dk>
26 Nov 2004 22:46:20 -0500

          From comp.compilers

Related articles
Grammar analysis vbdis@aol.com (2004-11-20)
Re: Grammar analysis lhp+news@toft-hp.dk (Lasse =?ISO-8859-1?Q?Hiller=F8e?= Petersen) (2004-11-26)
Re: Grammar analysis demakov@ispras.ru (Alexey Demakov) (2004-11-28)
Re: Grammar analysis vbdis@aol.com (2004-11-29)
Re: Grammar analysis lhp+news@toft-hp.dk (Lasse =?ISO-8859-1?Q?Hiller=F8e?= Petersen) (2004-12-01)
Re: Grammar analysis vbdis@aol.com (2004-12-01)
| List of all articles for this month |

From: "Lasse =?ISO-8859-1?Q?Hiller=F8e?= Petersen" <lhp+news@toft-hp.dk>
Newsgroups: comp.compilers
Date: 26 Nov 2004 22:46:20 -0500
Organization: No Organization
References: 04-11-090
Keywords: parse
Posted-Date: 26 Nov 2004 22:46:20 EST

vbdis@aol.com (VBDis) wrote:


> I'm just about to implement an grammar analyzer, and wonder about some
> of the algorithms described in the according literature.
>
> The computation of the First and Follow sets is not trivial,


I hope you meant trivial, I managed to write some code that seems to
work, so it must be! ;-)


> nonetheless I don't understand the "repeat until nothing changes"
> loops. Are these abbreviations of otherwise higher order algorithms,
> or did I miss newer and more straight algorithms?
>
> DoDi


Perhaps I may be allowed to answer by posting a short code example from
my LL(1) parser in the works, and add a question of my own. Please note
that I have used suggestive macros, this is code that works, but I omit
the definitions and declaration in hope that the algorithm itself is
clear enough. Here's the code:


void discover_first() {
        int s;
        Prod *p;
        /* compute FIRST sets for all symbols */


        /* for all terminal s: FIRST(s) = { s } */
        for(EACH_SYMBOL(s)) {
                FIRST(s) = bits_empty();
                if(TERMINAL(s)) bits_set1(&FIRST(s),s);
        }


        changed:
        for(EACH_PRODUCTION(s,p)) {
                /* The s: p rule can be one of two variants:
                        s: p1 p2 .. pj where all pi are nullable
                    or
                        s: p1 p2 .. pj r1 r2 ... where all pi except pj are nullable
                    FIRST(s) is union of FIRST(pi) for all i
                */
                int j;
                bits_t newfirst;


                newfirst = FIRST(s);


                for(j = 0 ; j < p->count && NULLABLE(p->seq[j]) ; j++) {
                        newfirst = bits_union(newfirst,FIRST(p->seq[j]));
                }
                if( j < p->count )
                        newfirst = bits_union(newfirst,FIRST(p->seq[j]));


                if( ! bits_equal(newfirst, FIRST(s) ) ) {
                        FIRST(s) = newfirst;
                        goto changed;
                }
        }
        /* now decorate each production with first sets
              to be used for selection decision */
        /* code for this deleted for sake of brevity */
}


As can be seen, each time newfirst has something added to it, the loop
restarts with a simple goto. So each time a production's FIRST changes,
the iteration over productions is restarted.


Another approach would be to just note the change in place of the goto:
{something_changed = 1;}


and wrap for(EACH_PRODUCTION(s,p)) with:


something_changed = 1;
while(something_changed) { something_changed = 0; ... }


this would finish the iteration over all productions before restarting
with the first production. In both cases it repeats until nothing
changes.


I don't think it matters much; but still I'd like to ask more
knowledgeable people: is there _any_ _significant_ difference in
efficiency between the two approaches?


I think I've seen references to "efficient" computation of FIRST and
FOLLOW; but given the simplicity of the approach above, when would such
efficient algorithms be worthwhile?


-Lasse


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.