a java parser generator

"Ev. Drikos" <drikosev@otenet.gr>
Fri, 29 Apr 2011 13:37:45 +0300

          From comp.compilers

Related articles
a java parser generator drikosev@otenet.gr (Ev. Drikos) (2011-04-29)
| List of all articles for this month |
From: "Ev. Drikos" <drikosev@otenet.gr>
Newsgroups: comp.compilers
Date: Fri, 29 Apr 2011 13:37:45 +0300
Organization: An OTEnet S.A. customer
Keywords: Java, parse, tools
Posted-Date: 02 May 2011 00:50:38 EDT

Hi,


The "Syntaxis.jar" parser/scanner generator suite has a new LR generator
that creates java packed parsers.


The basic features of such parsers/scanners are listed at the end of this
message along with two small examples that can help you figure out how the
LR Builders save some table space. The LR generator uses ranges to pack the
shift/reduce table.


This generator supports also "bison" compatible references to semantic
values; thanks to a small partial java parser created by this generator.


All features described are optional. I don't claim to be an expert in the
topic and constructive feedback is welcome. The demo version of the
program, "SyntaxisDemo.jar", is available for evaluation, on request.


Best Regards,
Ev. Drikos
--------------------------------------------------------------------------------
Parser Features:
-Target Language: Java.
-Suitable for parsing small strings.
-Algorithms: Optimized (or standard) SLR, LALR, and Canonical LR.
-Pure BNF.
-Scannerless Mode (Note: the scanner generator can export DFA's as Left
Recursive LALR(1) grammars; backtracking isn't supported).
-Ranges, Difference, Intersection, Follows Restrictions for terminals.
-Spacing/Comments: In scannerless grammars as preprocessor definitions.
-Panic Mode Recovery.
-Elimination of Single Reductions without Semantic Actions.
-Modifiable; it can be subclassed.
-Conflicts resolution for expressions; optionally, the "yacc" approach.


Features not supported:
-Mid Action Rules.
-Union Data Types.
-Precedence and Associatively Declarations.


Scanner Features:
-Embedded; it supports Intersection/Difference of regular grammars.
-Tokens Numbering; Manual/Automatic Synchronization with the parser.
-Return Statements; Optional. (skip comments/spaces declarations)
-The Lookahead operator.
-Optionally, the scanner can return multiple/alternative tokens for the same
portion of the input. The parser examines such alternative tokens on
failure.


Below are two grammar examples from the book:
  "Compilers, Principles Techniques, and Tools".
2) Table Sizes for an LR(1) grammar; example 4.44.
2.1) The Syntax Rules.
<S'> ::= S
S ::= a A d | b B d | a B e | b A e
A ::= c
B ::= c


2.2) The Size of Parsing Tables.
2.2.1) Standard algorithm; States/Conflicts:
SLR: 14/1,
LALR: 14/1, and (cross checked with bison).
CLR: 15/0.
2.2.2) Eliminate Single Reductions; States/Conflicts:
SLR: 10/0,
LALR: 10/0, and
CLR: 10/0.
2.2.3) Optimize (Resolves Conflicts among others); States/Conflicts:
SLR: 14/1,
LALR: 15/0, and
CLR: 15/0.
2.2.4) Eliminate Single + Override Equivalent Last Items:
SLR: 7/0,
LALR: 7/0, and
CLR: 7/0.
Note: The first two alternatives of "S" have equal number of symbols, equal
last items, and don't have different semantic actions, thus:
The dotted item S-> b .B d on "B" goes to S->a A .d
Otherwise, the dotted item S-> b B .d on "d" could go to S->a A d.
If you know that this technique has a different name, please tell me.


2.2.5) Eliminate Single + Override + Optimize:
SLR: 7/0,
LALR: 7/0, and
CLR: 7/0.


3) The grammar of a basic Calculator.
It is the equivalent of the example 4.52 (Fig. 4.59).
It has 17 states.
3.1) The Syntax Rules.
grm ::= lines
lines ::= lines expr \n /* printf("%g\n" , $2 ); */
          | lines \n
          | NONE
          | ERROR \n /* printf( "reenter last line:\n") ; error=false; */
expr ::=
    expr <-|+> expr /* if (*str(2)=='+') $0 += $3 ; else $0 -= $3;*/
  |expr <*|/> expr /* if (*str(2)=='*') $0 *= $3 ; else $0 /= $3;*/
          | factor
<-|+> ::= - | +
<*|/> ::= *| /
factor ::= ( expr ) /* $$ = $2 ; */
          | NUMBER
          | - factor /* $$ = - $2 ; */
3.2) The Lexical Rules.
(The declaration "#ignore space" instructs the scanner to skip spaces.)
#ignore space
token ::= NUMBER /* yylval = atof( str(1) ) ; */ | space
NUMBER ::= { [ { 0 .. 9 }... ] [ { . } { 0 .. 9 }... ] } -= { NONE }
space ::= \t | \s| \r



Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.