Grammars for LL(1) grammars?

Lasse "Hillerøe" Petersen <lhp+news@toft-hp.dk>
24 May 2004 00:23:43 -0400

          From comp.compilers

Related articles
Grammars for LL(1) grammars? lhp+news@toft-hp.dk (LasseHillerøePetersen) (2004-05-24)
Re: Grammars for LL(1) grammars? lhp+news@toft-hp.dk (Lasse =?ISO-8859-1?Q?Hiller=F8e?= Petersen) (2004-07-13)
van Wijngaarden Grammars, Was: Grammars for LL(1) grammars? wb@arb-phys.uni-dortmund.de (2004-07-14)
Re: van Wijngaarden Grammars, Was: Grammars for LL(1) grammars? lhp+news@toft-hp.dk (Lasse =?ISO-8859-1?Q?Hiller=F8e?= Petersen) (2004-07-28)
| List of all articles for this month |
From: Lasse "Hillerøe" Petersen <lhp+news@toft-hp.dk>
Newsgroups: comp.compilers
Date: 24 May 2004 00:23:43 -0400
Organization: Private
Keywords: LL(1), parse
Posted-Date: 24 May 2004 00:23:43 EDT

Just for fun, I am writing a (sort of) parser generator for LL(1)
languages. (Rather than generating a parser from an EBNF grammar
description with embedded actions, I want my parser to eventually
accept two inputs: a grammar in EBNF, and the input text to be parsed,
and output a version of the input text marked up in XML style with the
nonterminals as elements, ie: 2+3*4 =>
<expr><sum><prod><fac>2</fac></prod>+<prod><fac>3</fac>*<fac>4</fac></pro
d></sum></expr>.)


A thought occured to me, while playing around with all this (especially
the embedded parser for parsing the EBNF grammar), probably inspired by
my numerous failed attempts at really understanding the Algol68 grammar.


One thing I seem to have understood from that grammar is that
productions which may produce the empty string are denoted as FOOETY.


Now, while writing my parser generator, and of course having to
calculate nullability, and first and follow sets, this seemed to click.


A simple self-describing LL(1) grammar for EBNF could be:
Grammar: { Rule }.
Rule: Nonterminal ":" Productions "." .
Productions: Sequence { "|" Sequence }.
Sequence: { Element }.
Element: "{" Productions "}" | Symbol.
Symbol: Terminal | Nonterminal.


I think that the following grammar for EBNF will ensure that all
nullables are named FooETY.


GrammarETY: { Rule | Rule0 }.
Rule: Nonterminal ":" Productions ".".
Rule0: Nonterminal0 ":" Productions0ETY ".".
Productions: Sequence { "|" Sequence }.
Productions0ETY: Sequence0ETY { "| Sequence }.
Sequence: Element Sequence0ETY.
Sequence0ETY: { Element0 }.
Element: Terminal | Nonterminal.
Element0: "{" Productions "}" | Symbol.
Symbol: Terminal | Nonterminal | Nonterminal0.


where the symbol set is divided in three sets, names for nonterminals
ending in ETY for nullable productions.


My first question is: is this correct? And except requiring that a
nullable sequence is put before any non-nullable, and possibly some
other minor modifications, will it handle all LL(1) grammars? (My guess
is that it will?)


My second question is: Can this method be extended to also cover other
aspects of LL(1) grammars, perhaps by parametrising the non-nullable
nonterminals with their first sets? Or, in other words: is there some
sort of LL(1) grammar for some kind of EBNF, which will accept only EBNF
texts describing LL(1) grammars, and which can be written in itself?


-Lasse


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.