Syntaxis.jar; LALR parsing for languages with unreserved keywords

"Ev. Drikos" <drikosev@otenet.gr>
Wed, 2 Nov 2011 19:40:12 +0200

          From comp.compilers

Related articles
Syntaxis.jar; LALR parsing for languages with unreserved keywords drikosev@otenet.gr (Ev. Drikos) (2011-11-02)
| List of all articles for this month |
From: "Ev. Drikos" <drikosev@otenet.gr>
Newsgroups: comp.compilers
Date: Wed, 2 Nov 2011 19:40:12 +0200
Organization: An OTEnet S.A. customer
Keywords: available, parse, Java
Posted-Date: 02 Nov 2011 22:44:14 EDT

Hello,


This message describes a new feature of the parser/scanner generator suite
"Syntaxis.jar".


This feature can help you parse languages with unreserved keywords using a
LALR parser and a tokenizer; you need to carry out the following steps:


1) Use the difference operator "but not" ("-=") to exclude any reserved
words from identifiers and activate the scanner generator option to report
all conflicting tokens.


2) In the parser generator activate the option
"Shift Simultaneously Conflicting Tokens".


3) Give the full path name of the document with the lexical rules.


At the end of this message there is a small elegant grammar I found in a
paper with title: "LALR parsing for languages without reserved words".


In this example, a new production for identifiers where all unreserved
keywords are listed as alternatives of the token "identifier" would
introduce a shift/reduce conflict.


With the option (2) above activated and without any grammar restatements,
the LALR builder of "Syntaxis.jar" builds a parsing table without conflicts.
Ultimately, the generated LALR parser accepts keywords as identifiers.


To give you another example, if you build an ISO SQL 2003 parser with the
technique described in this message, instead of adding a new production for
identifiers, the LALR table can be 5.84 times smaller.


Constructive feedback is welcome.


Best Regards,
Ev. Drikos






A) Syntax Rules
-----------------------------------------------------------------------
grm ::=
                      program


program ::=
                      BEGIN statements END


statements ::=
                      statement ; statements
          | statement ;


statement ::=
                      reference = expression
          | ASSERT expression


reference ::=
                      IDENTIFIER
          | IDENTIFIER ( expression )


expression ::=
                      ( expression )
          | reference






B) Lexical Conventions
-----------------------------------------------------------------------
#ignore spaces


token ::=
                      BEGIN
          | END
          | ASSERT
          | spaces
          | IDENTIFIER


BEGIN ::=
                      B E G I N


END ::=
                      E N D


ASSERT ::=
                      A S S E R T


spaces ::=
                      { t | \n | \r | \s }...


IDENTIFIER ::=
                      { A .. Z }...



Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.