Trouble understanding LL(k) vs. LALR(k) etc.

Johnathan <zork_666@nospammail.net>
11 Mar 2004 12:50:54 -0500

          From comp.compilers

Related articles
Trouble understanding LL(k) vs. LALR(k) etc. zork_666@nospammail.net (Johnathan) (2004-03-11)
Re: Trouble understanding LL(k) vs. LALR(k) etc. maeder@tzi.de (Christian Maeder) (2004-03-15)
Re: Trouble understanding LL(k) vs. LALR(k) etc. cfc@shell01.TheWorld.com (Chris F Clark) (2004-03-15)
Re: Trouble understanding LL(k) vs. LALR(k) etc. cdc@maxnet.co.nz (Carl Cerecke) (2004-03-19)
Re: Trouble understanding LL(k) vs. LALR(k) etc. derkgwen@HotPOP.com (Derk Gwen) (2004-03-26)
Re: Trouble understanding LL(k) vs. LALR(k) etc. cfc@shell01.TheWorld.com (Chris F Clark) (2004-03-26)
Re: Trouble understanding LL(k) vs. LALR(k) etc. f40mczf02@sneakemail.com (Karsten Nyblad) (2004-04-03)
[7 later articles]
| List of all articles for this month |

From: Johnathan <zork_666@nospammail.net>
Newsgroups: comp.compilers
Date: 11 Mar 2004 12:50:54 -0500
Organization: Compilers Central
Keywords: parse, question
Posted-Date: 11 Mar 2004 12:50:54 EST

I'm having difficulty following compiler textbooks that use abstract
notations instead of more concrete examples. For instance, I wouldn't
know how to tell an LL(k) grammar from an LR(k)/LALR(k) grammar, nor
how to write an LL(k) grammar and check to see if it really is LL(k).
I have a vague idea of left factorisation and the principles of
recursive descent parsing, and first and follow sets. Is that all I
need to know to determine if a language grammar is LL-able? What is
the difference between left and right recursion?


There are some good grammars and parser generators out there, but from
just reading the example grammars that come with these tools I am
having difficulty spotting what makes LL different from LALR or LR.
First I'd like to concentrate on LL(k), since it has been said that
this class of language grammar can be parsed with hand-written
recursive descent parsers. I would rather learn the manual way first
and try to get a deep understanding/enlightenment before using tools
like SableCC or Bison or whatever.


I'd like to write a grammar for a dialect of BASIC that is LL(k) and
write a parser myself. Here is my first effort at a grammar. I've
left out the tokens because it's fairly obvious what they are. I've
also added a bit of pseudocode that describes what I think the
structure of the parser would look like.




Program = <Options> <Statements> <SubFuncDecls> EOF


Options = OPTION <OptType>
OptType = EXPLICIT { | add more }


Statements = <Statement> | <Statements> <Statement>


Statement = <AssignExpr> | <GotoStmt> | <IfStmt> |
<ForStmt> | <WhileStmt> | <RptStmt> |
<PrintStmt> | <InputStmt> | <ReadStmt> |
<WriteStmt> | <GosubStmt> | <DeclStmt>


DeclStmt = VAR <VarList> AS <TypeName> | <ConstDecl>
ConstDecl = CONST IDENTIFIER AS <TypeName> ASSIGNOP <Expr>


VarList = IDENTIFIER | IDENTIFIER , IDENTIFIER


GotoStmt = GOTO <Label>
GosubStmt = GOSUB IDENTIFIER


IfStmt = IF <BoolExpr> THEN <Statements> <OptIf>
<OptElse> ENDIF


// Will empty productions here cause a problem? Maybe not because
// we can peek ahead to see if there's an ENDIF token
OptIf = ELSEIF <BoolExpr> THEN <Statements> | <Empty>
OptElse = ELSE <Statements> | <Empty>


ForStmt = FOR <Variable> ASSIGNOP <Expr> <OptStep>
<Statements> NEXT


OptStep = STEP <ArithExpr> | <Empty>


WhileStmt = WHILE <BoolExpr> DO <Statements> WEND


RptStmt = REPEAT <Statements> UNTIL <BoolExpr>
PrintStmt = PRINT <PrintList>
InputStmt = INPUT <String> <OptVarList>
ReadStmt = READ FILEHANDLE <Variable> {FIXME}
WriteStmt = WRITE FILEHANDLE <Variable> |
WRITE FILEHANDLE <Literal> {FIXME}


AssignExpr = <Variable> ASSIGNOP <Expr>
Expr = <BoolExpr> | <ArithExpr>


LogicExpr = TRUE | FALSE
BoolExpr = <LogicExpr> | LOGICALOP <LogicExpr> |
NOT <LogicExpr>


// ADDOP is + or -
// MULOP is *, / or mod
ArithExpr = <Term> | ADDOP <Term>
Term = <Factor> | MULOP <Factor>


// ADDOP here is saying +NUM or -NUM
Factor = ADDOP <Atom> | EXPOP <Atom> | <Atom>


Atom = <Variable> | <Number>
| LPAREN <ArithExpr> RPAREN | <FuncCall>


FuncCall = IDENTIFIER ( <ParamList> )
SubCall = CALL IDENTIFIER




// A first go at thinking about how this recursive descent parser
// will be structured
PARSER STRUCTURE


Program()
Options()
Statements()
Procedures()
Match( EOF )


Options()
If Tok Is TOK_OPTION Then
Match( TOK_OPTION )
ProgramOptions()
End If


Statements()
// Something about having subs or funcs here instead of
// a program?
While Tok In Set ( Statement First Set )
Statement()


Procedures()
While Tok Is TOK_SUB Or TOK_FUNCTION
If Tok Is TOK_SUB
Match( SUB )
SubStatements()
Match( ENDSUB )
Else
Match( FUNCTION )
ParamList()
Match( RET_TYPE )
FuncStatements()
Match( END_FUNC )
End If




---- END ----


What I am having problems with in the grammar above is the recursive
parts. For instance, is this valid?


DeclStmt = VAR <VarList> AS <TypeName> | <ConstDecl>
ConstDecl = CONST IDENTIFIER AS <TypeName> ASSIGNOP <Expr>


VarList = IDENTIFIER | IDENTIFIER , IDENTIFIER




How can I check to see if it produces things like:


Dim myVar, var2, var3 As String
Dim var4 As Integer


(I use VAR to mean the DIM keyword, I just haven't fixed the token names
to reflect that yet).


I am thinking that my grammar will allow things like this:


Dim Var1, Var2 Var3, Var4 As Whatever


... because of the VarList rule. Note the missing comma between Var2
and Var3. I thought this would be better:


DeclStmt ::= DIM <Varname> { ',' <Varname> } As TYPENAME


... assuming that the part in { } is the part that's recursive.


Is my grammar LL(k)? Why or why not? I am not really sure where to go
from here!


Thanks very much for your help.
Johnathan


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.