Statement at a time parsing with yacc

compres!chris@crackers.clearpoint.com (Chris Clark)
Thu, 12 Dec 91 22:38:45 EST

          From comp.compilers

Related articles
Statement at a time parsing with yacc przemek@viewlogic.com (1991-12-06)
Re: Statement at a time parsing with yacc bart@cs.uoregon.edu (1991-12-10)
Re: Statement at a time parsing with yacc bliss@sp64.csrd.uiuc.edu (1991-12-10)
Re: Statement at a time parsing with yacc chris@mks.com.ca (1991-12-11)
Statement at a time parsing with yacc compres!chris@crackers.clearpoint.com (1991-12-12)
Parsing question arshad@dcs.ed.ac.uk (Arshad Mahmood) (1991-12-13)
Re: Statement at a time parsing with yacc d87-jse@nada.kth.se (1991-12-17)
| List of all articles for this month |

Newsgroups: comp.compilers
From: compres!chris@crackers.clearpoint.com (Chris Clark)
Keywords: yacc, parse, OOP
Organization: Compilers Central
Date: Thu, 12 Dec 91 22:38:45 EST

You asked,


> Is it possible to trick yacc to go into this "statement by statement" mode
> of operation? This has to be done dynamically, i.e. sometimes I want to
> parse a whole file at once and sometimes I want it to do it statement by
> statement with the parser returning control to me (yyparse) after each
> statement is parsed.


I think you could do it in yacc, but before I get into that I want to
suggest what I believe is an easier way, by using built-in incremental
lexing and parsing features of Yacc++(R) and the Language Objects Library.
Yacc++ is a rewrite of lex and yacc following object-oriented principles.


One feature which fell out of our object-oriented design is the ability to
stop and start the lexer and parser at any point during the parse and pick
up from there. Let me show you how to use this feature to implement
stopping the parser after a statement is recognized whenever the
"incremental_parse" flag is set. (This code is similar to one of our
tutorial examples used for a demonstration program at OOPSLA and C++ at
Work.)


> My top two rules are as follows and %start is set to top_statements:
>
>
> top_statements:
> statements {parse_return = $1;}
> | {parse_return = NULL;}
> ;
>
> statements:
> statement {$$ = cons ($1, NULL);}
> | statements statement {$$ = cons ($2, $1);}
> ;


A one-to-one translation to Yacc++ equivalent code with incremental
parsing option:


top_statements:
          statements {parse_return = yy_psr_ref(1);}
      | {parse_return = NULL;}
;


statements:
          statement {yy_psr_rslt() = cons (yy_psr_ref(1), NULL);
                                                      if (incremental_parse) yy_psr_pause();}
      | statements statement {yy_psr_rslt() = cons (yy_psr_ref(2), yy_psr_ref(1));
                                                      if (incremental_parse) yy_psr_pause();}
;


Yacc++ code would typically use a regular expression for the statements
production. (In fact, we would probably use "statement*" instead of
defining a non-terminal statements, but that's an efficiency and conflict
reduction issue.) There are also routines supplied in the Language
Objects Library for constructing the list. (Anything in these examples
with the yy_ prefix is supplied code).


statements: (statement {if (incremental_parse) {
yy_psr_rslt() = yy_ast_all(); // build list
// of entire lhs
yy_psr_pause(); // (seen so far)
} }
                              )*
                              { yy_psr_rslt() = yy_ast_all(); /* build list of entire lhs */}


So how does this code work? Well, first of all our lexer and parser
engines are not stand alone functions. They are class member functions
[in C++, in C we simulate the parts of C++ we need with macros and naming
conventions]. The actual lexer and parser objects are data structures
[instances of the appropriate class] which contain all of the state
information. When the member function yy_psr_pause() is called, it merely
sets the appropriate fields in the object which causes the lexer and
parser to return the next time they need input. Because the entire lexer
and parser state is in the objects (and not in local or static variables),
the lexer and parser pick up next time exactly where they left off.


The yy_psr_ref() and yy_psr_rslt() member functions manipulate the
semantic stack to provide the functionality of $d and $$. The
yy_ast_all() function takes all of the entries in the semantic stack for
the current production and builds a structure which represents them--if
yy_ast_all() is called while the production is only partially processed,
it will put only those entries seen so far in the structure. [Note, the
"list" yy_ast_all() builds is actually an array preceded by its length,
rather than a series of cons-cells.]


Another object-oriented feature of Yacc++ which may be relevant is the
"public" declaration. Public declarations list which fragments you wish
to recognize and our variant of yy_parse takes the fragment of the grammar
you wish to parse as one of its arguments. This would be an appropriate
method to use if you are building something like a syntax directed editor.


Now having explained the relevant features of Yacc++, let's consider what
can be done to help you out if you are committed to sticking with yacc.


(begin yacc suggestions)


One way is to move the appropriate variables into data structures and make
your own lexer and parser objects. I do not have enough experience
hacking on the lex and yacc engines to tell you how easy or hard that
would be. Some of the code (especially loops) in our engines had to be
carefully designed to make them able to pick up exactly where they left
off. For example, you need to consider issues like restarting the parser
when it was in the middle of error recovery when it was paused.


A second way which would be easier but is restrictive is to fake an early
termination of the file. This works if you have an "acceptable" grammar
like the one in your example. An acceptable grammar is one whose:


top production is a list and you only want to be able to stop
between elements of the list. The grammar needs to be
a list because you will not be saving and restoring the
yacc's parser state. You will be throwing away your left
                                context. Therefore, you must be able to restart by
                                pretending that you are at the beginning of a list.
                                Note, that this also has ramifications on your action
code--you can't simply use the yacc stack for
                                concatenating new statements on to the end of the
list when the list spans a yyparse call.


each element of the list needs to end cleanly so that the
parser does not need lookahead to know when a
statement ends. The statements have to end cleanly
because you don't want the parser to already have
asked the lexer for a lookahead token. You want the
artificial EOF token you insert to be be the next
token seen after the last token of the statement. If
the parser had already picked up the lookahead token,
so it wasn't your EOF token, then the parser would see
the fragment "statements lookahead-token EOF" which
would not be a legal fragment. (Using yyclearin may
remove this restriction.)


To fake an early termination, you will need a variable for communicating
from the parser to the lexer to tell the lexer when to send the EOF token
rather than the normal token (and probably a buffer for holding the token
which was not transmitted). You might end up inserting a whole filter
between lex and yacc to do that. The filter could be implemented as a
macro (possibly in the yacc parser sources) and might not be too much
code.


(end yacc suggestions)


Chris Clark (crackers.clearpoint.com!compres!chris)
Compiler Resources, Inc. 508 435 5016 voice
3 Proctor St. 508 435 4847 fax
Hopkinton, MA 01748


(R) Yacc++ is a registered trademark of Compiler Resources, Inc.
[In the yacc parser, the hard part is exactly dealing with lookahead tokens,
since in general you can't count on not needing them. -John]
--


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.