Related articles |
---|
Syntaxis.jar; LALR parsing for languages with unreserved keywords drikosev@otenet.gr (Ev. Drikos) (2011-11-02) |
From: | "Ev. Drikos" <drikosev@otenet.gr> |
Newsgroups: | comp.compilers |
Date: | Wed, 2 Nov 2011 19:40:12 +0200 |
Organization: | An OTEnet S.A. customer |
Keywords: | available, parse, Java |
Posted-Date: | 02 Nov 2011 22:44:14 EDT |
Hello,
This message describes a new feature of the parser/scanner generator suite
"Syntaxis.jar".
This feature can help you parse languages with unreserved keywords using a
LALR parser and a tokenizer; you need to carry out the following steps:
1) Use the difference operator "but not" ("-=") to exclude any reserved
words from identifiers and activate the scanner generator option to report
all conflicting tokens.
2) In the parser generator activate the option
"Shift Simultaneously Conflicting Tokens".
3) Give the full path name of the document with the lexical rules.
At the end of this message there is a small elegant grammar I found in a
paper with title: "LALR parsing for languages without reserved words".
In this example, a new production for identifiers where all unreserved
keywords are listed as alternatives of the token "identifier" would
introduce a shift/reduce conflict.
With the option (2) above activated and without any grammar restatements,
the LALR builder of "Syntaxis.jar" builds a parsing table without conflicts.
Ultimately, the generated LALR parser accepts keywords as identifiers.
To give you another example, if you build an ISO SQL 2003 parser with the
technique described in this message, instead of adding a new production for
identifiers, the LALR table can be 5.84 times smaller.
Constructive feedback is welcome.
Best Regards,
Ev. Drikos
A) Syntax Rules
-----------------------------------------------------------------------
grm ::=
program
program ::=
BEGIN statements END
statements ::=
statement ; statements
| statement ;
statement ::=
reference = expression
| ASSERT expression
reference ::=
IDENTIFIER
| IDENTIFIER ( expression )
expression ::=
( expression )
| reference
B) Lexical Conventions
-----------------------------------------------------------------------
#ignore spaces
token ::=
BEGIN
| END
| ASSERT
| spaces
| IDENTIFIER
BEGIN ::=
B E G I N
END ::=
E N D
ASSERT ::=
A S S E R T
spaces ::=
{ t | \n | \r | \s }...
IDENTIFIER ::=
{ A .. Z }...
Return to the
comp.compilers page.
Search the
comp.compilers archives again.