Re: Grammar analysis

"Lasse =?ISO-8859-1?Q?Hiller=F8e?= Petersen" <lhp+news@toft-hp.dk>
26 Nov 2004 22:46:20 -0500

From comp.compilers

Related articles
Grammar analysis vbdis@aol.com (2004-11-20)
*Re: Grammar analysis lhp+news@toft-hp.dk (Lasse =?ISO-8859-1?Q?Hiller=F8e?= Petersen)* (2004-11-26)**
Re: Grammar analysis demakov@ispras.ru (Alexey Demakov) (2004-11-28)
Re: Grammar analysis vbdis@aol.com (2004-11-29)
Re: Grammar analysis lhp+news@toft-hp.dk (Lasse =?ISO-8859-1?Q?Hiller=F8e?= Petersen) (2004-12-01)
Re: Grammar analysis vbdis@aol.com (2004-12-01)

| List of all articles for this month |

From:	"Lasse =?ISO-8859-1?Q?Hiller=F8e?= Petersen" <lhp+news@toft-hp.dk>
Newsgroups:	comp.compilers
Date:	26 Nov 2004 22:46:20 -0500
Organization:	No Organization
References:	04-11-090
Keywords:	parse
Posted-Date:	26 Nov 2004 22:46:20 EST

vbdis@aol.com (VBDis) wrote:

> I'm just about to implement an grammar analyzer, and wonder about some
> of the algorithms described in the according literature.
>
> The computation of the First and Follow sets is not trivial,

I hope you meant trivial, I managed to write some code that seems to
work, so it must be! ;-)

> nonetheless I don't understand the "repeat until nothing changes"
> loops. Are these abbreviations of otherwise higher order algorithms,
> or did I miss newer and more straight algorithms?
>
> DoDi

Perhaps I may be allowed to answer by posting a short code example from
my LL(1) parser in the works, and add a question of my own. Please note
that I have used suggestive macros, this is code that works, but I omit
the definitions and declaration in hope that the algorithm itself is
clear enough. Here's the code:

void discover_first() {
        int s;
        Prod *p;
        /* compute FIRST sets for all symbols */

        /* for all terminal s: FIRST(s) = { s } */
        for(EACH_SYMBOL(s)) {
                FIRST(s) = bits_empty();
                if(TERMINAL(s)) bits_set1(&FIRST(s),s);
        }

        changed:
        for(EACH_PRODUCTION(s,p)) {
                /* The s: p rule can be one of two variants:
                        s: p1 p2 .. pj where all pi are nullable
                    or
                        s: p1 p2 .. pj r1 r2 ... where all pi except pj are nullable
                    FIRST(s) is union of FIRST(pi) for all i
                */
                int j;
                bits_t newfirst;

                newfirst = FIRST(s);

                for(j = 0 ; j < p->count && NULLABLE(p->seq[j]) ; j++) {
                        newfirst = bits_union(newfirst,FIRST(p->seq[j]));
                }
                if( j < p->count )
                        newfirst = bits_union(newfirst,FIRST(p->seq[j]));

                if( ! bits_equal(newfirst, FIRST(s) ) ) {
                        FIRST(s) = newfirst;
                        goto changed;
                }
        }
        /* now decorate each production with first sets
              to be used for selection decision */
        /* code for this deleted for sake of brevity */
}

As can be seen, each time newfirst has something added to it, the loop
restarts with a simple goto. So each time a production's FIRST changes,
the iteration over productions is restarted.

Another approach would be to just note the change in place of the goto:
{something_changed = 1;}

and wrap for(EACH_PRODUCTION(s,p)) with:

something_changed = 1;
while(something_changed) { something_changed = 0; ... }

this would finish the iteration over all productions before restarting
with the first production. In both cases it repeats until nothing
changes.

I don't think it matters much; but still I'd like to ask more
knowledgeable people: is there _any_ _significant_ difference in
efficiency between the two approaches?

I think I've seen references to "efficient" computation of FIRST and
FOLLOW; but given the simplicity of the approach above, when would such
efficient algorithms be worthwhile?

-Lasse

Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.

Re: Grammar analysis

"Lasse =?ISO-8859-1?Q?Hiller=F8e?= Petersen" <lhp+news@toft-hp.dk>26 Nov 2004 22:46:20 -0500

"Lasse =?ISO-8859-1?Q?Hiller=F8e?= Petersen" <lhp+news@toft-hp.dk>
26 Nov 2004 22:46:20 -0500