flex: nested scopes

Skyscraper System Administrator <bjdouma@xs4all.nl>
16 Apr 2005 11:14:08 -0400

          From comp.compilers

Related articles
flex: nested scopes bjdouma@xs4all.nl (Skyscraper System Administrator) (2005-04-16)
| List of all articles for this month |

From: Skyscraper System Administrator <bjdouma@xs4all.nl>
Newsgroups: comp.compilers
Date: 16 Apr 2005 11:14:08 -0400
Organization: a training zoo
Keywords: lex, comment

Hi,


I wonder if anybody would like to comment on the following.


Take the following scan.l and parser.y (please never mind their other
shortcomings that aren't germane to the discussion below). They're
compiled into a little program called lextest.


Note: using flex version 2.5.4, bison 2.0.




scan.l:
-------
T_SET "set"
T_SYM "foo"
T_PAO \(
T_PAC \)
T_EQ =
T_COMMA ,
T_VAL [0]+|[1-9][0-9]*
T_WS [ \t]


    /* start conditions */
%x tok0
%x tok1
%x tok2
%x eq
%x val
%x comma
%x pao
%x pac


%%


    /*-------------------------------------------------*/


<INITIAL>{T_SET} { save_s; BEGIN( tok0 ); return( T_SET ); }
<tok0>{T_WS} { ; }
<tok0>{T_SYM} { save_s; BEGIN( tok1 ); return( T_SYM ); }
<tok1>{T_EQ} { save_s; BEGIN( eq ); return( T_EQ ); }


<eq>{T_PAO} { save_s; BEGIN( pao ); return( T_PAO ); }
<eq,val>{T_PAC} { save_s; BEGIN( pac ); return( T_PAC ); }
<eq,pao,comma>{T_VAL} { save_s; BEGIN( val ); return( T_VAL ); }
<eq,val>{T_COMMA} { save_s; BEGIN( comma ); return( T_COMMA ); }
<eq,pac>. { save_s; BEGIN( 0 ); }


    /*-- misc -----------------------------------------*/


<<EOF>> { yyterminate(); }
<comma,pao>\n|\0 { yyterminate(); exit( 0 ); }
<*>\n|\r { line_nr++; BEGIN( 0 ); }
<*>. { yyerror( yytext ); }






parser.y
--------
%token <op> T_SET
%token <val> T_VAL T_PAO T_PAC
%token <s_value> T_SYM
%token <c_value> T_COMMA T_EQ


/* non-terminals */
%type <op> set
%type <lhs> lhs sym
%type <val> rhs pao val pac
%type <c_value> eq


%%


input: '\n'
                  | op
                  | input op
                  | error { fprintf( stderr, "*** error ***\n" ); }
;
op: set lhs eq rhs
;
lhs: sym
                  | lhs T_COMMA sym
;
rhs: val
                  | pao rhs pac
                  | rhs T_COMMA rhs
;
sym: T_SYM { fprintf( stderr, "%s\n", $1 ); }
;
set: T_SET { fprintf( stderr, "%s\n", $1 ); }
;
val: T_VAL { fprintf( stderr, "%lu\n", strtoul( $1, NULL, 16 ) ); }
;
eq: T_EQ { fprintf( stderr, "%d\n", T_EQ ); }
;
pao: T_PAO { fprintf( stderr, "%d\n", VAL_PAO ); }
;
pac: T_PAC { fprintf( stderr, "%d\n", VAL_PAC ); }
;




This normally should accept something like 'set foo=1' or 'set foo=(1,2)'.


Now what I did was shorten the <eq>... statements by nesting them in a start
condition, like so:


<eq>{
                  {T_PAO} { save_s; BEGIN( pao ); return( T_PAO ); }
                  <val>{T_PAC} { save_s; BEGIN( pac ); return( T_PAC ); }
                  <pao,comma>{T_VAL} { save_s; BEGIN( val ); return( T_VAL ); }
                  <val>{T_COMMA} { save_s; BEGIN( comma ); return( T_COMMA ); }
                  <pac>. { save_s; BEGIN( 0 ); }
}


This creates exactly the same scanner/parser (as evidenced by exactly
identical parser.output files).


Both variations, when running the command `echo "set bar=)" | lextest'
correctly scan the T_PAC (although that may not be desired in the final
program, but let's just consider that in the currrent circumstance it
finds no problem with it; the parser however finds problem with it and
returns a syntax error).


The thing is, in the first variation, the fact that the T_PAO is scanned
correctly can clearly be seen in the line <eq,val>{T_PAC} ... ,
because this clearly allows a T_PAC after a T_EQ.


However, in the second variation, this fact (that T_PAC after T_EQ, i.e.
in start condition <eq>, is allowed) seems to me totally lost; I would
"read" the second variation as "when having been in start condition <eq>,
[pushed] and subsequently being in start condition <val> [both exclusive],
allow a T_PAC", and thusly read, this would mean a T_PAC is /not/ allowed
directly following a T_EQ.


Also, just for the heck of it I tried this: <eq><val>{T_PAC} ...
and the created scanner/parser is exactly the same... Only from this
(undocumented) 'feature' does the apparent working of the second variation
seem to make sense (regarding the curly brace as just a grouping token);
i.e. a sequence of <sc1><sc2>... is also to be read as
"either sc1 or <sc2> or ...".


bjd
[I didn't know you could do that. -John]


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.