Re: WANTED: stylistic advice/guidelines for YACC and LEX

"Robert F. Monroe" <Robert@hever.demon.co.uk>
4 Feb 1996 12:41:06 -0500

          From comp.compilers

Related articles
WANTED: stylistic advice/guidelines for YACC and LEX timd@Starbase.NeoSoft.COM (1996-01-27)
Re: WANTED: stylistic advice/guidelines for YACC and LEX scooter@mccabe.com (Scott Stanchfield) (1996-01-29)
Re: WANTED: stylistic advice/guidelines for YACC and LEX rmonroe@hever.demon.co.uk (Robert F. Monroe) (1996-01-31)
Re: WANTED: stylistic advice/guidelines for YACC and LEX rmonroe@hever.demon.co.uk (Robert F. Monroe) (1996-02-02)
Re: WANTED: stylistic advice/guidelines for YACC and LEX Robert@hever.demon.co.uk (Robert F. Monroe) (1996-02-04)
Re: WANTED: stylistic advice/guidelines for YACC and LEX rmonroe@hever.demon.co.uk (Robert F. Monroe) (1996-02-09)
Re: WANTED: stylistic advice/guidelines for YACC and LEX ltribble@msmail4.hac.com (Tribble, Louis) (1996-02-09)
| List of all articles for this month |

From: "Robert F. Monroe" <Robert@hever.demon.co.uk>
Newsgroups: comp.compilers
Date: 4 Feb 1996 12:41:06 -0500
Organization: Robert F. Monroe
References: 96-02-019 96-01-084 96-01-139
Keywords: yacc, lex

I put together a working example of the semantic actions/virtual
function idea that I was talking about. This interests me and I
believe that it is in keeping with the topic of this thread. I would
like to know if anyone else sees this as a useful technique, or if
they have seen something similar elsewhere. I will keep it as brief as
possible.


A few weeks ago, I was playing with the output of a parts of speech
tagging program. I wrote a very simple yacc parser to read the
tagger's output and create an internal representation of the tagged
text. Nothing too fancy, but it struck me as a good candidate for
demonstrating the use of C++ virtual functions to implement semantic
actions. I originally wrote it in C, so I had to convert it to C++.


This is how it goes:


1. I wrote my yacc spec and the associated lex spec and ran them
through MKS LEX and Yacc using the -LC switch to generate C++ classes
for the scanner and parser. The yacc spec looks like this:


  %{
  #include "pos.h" // Structures for internal rep of a POS text
  #include "lex_yy.hpp" // Lex generated scanner class declaration
  #include "yparse.hpp" // Yacc generated parser class declaration,
           // revised to include 'virtual semantics'
  %}


  %union {
   word *pw;
   sentence *ps;
  }


  %token <pw> UNKNOWN CC CD DT EX FW IN JJ JJR JJS LS MD NN NNS
  %token <pw> NP NPS PDT POS PP PP_DOLLAR RB RBR RBS RP SYM TO UH VB
  %token <pw> VBD VBG VBN VBP VBZ WDT WP WP_DOLLAR WRB DBL_Q DOLLAR
  %token <pw> HASH_M LEFT_SQ RIGHT_SQ LEFT_DQ RIGHT_DQ
  %token <pw> LEFT_PAREN RIGHT_PAREN COMMA FINAL_PUNCT MIDS_PUNCT


  %type <pw> word
  %type <ps> sentence sentence.list word.list pos.text


  %%


  pos.text: sentence.list {pPosTxt = PosText($1);}
   ;
  sentence.list: sentence {$$ = SentenceList($1);}
   | sentence.list sentence {$$ = SentenceList($1, $2);}
   ;
  sentence: word.list FINAL_PUNCT {$$ = Sentence($1, $2);}
   ;
  word.list: word {$$ = WordList($1);}
   | word.list word {$$ = WordList($2, $1);}
   ;
  word:
      UNKNOWN | CC | CD | DT | EX | FW | IN | JJ | JJR | JJS | LS |
      MD | NN | NNS | NP | NPS | PDT | POS | PP | PP_DOLLAR | RB |
      RBR | RBS | RP | SYM | TO | UH | VB | VBD | VBG | VBN | VBP |
      VBZ | WDT | WP | WP_DOLLAR | WRB | DBL_Q | DOLLAR |
      HASH_M | LEFT_SQ | RIGHT_SQ | LEFT_DQ | RIGHT_DQ |
      LEFT_PAREN | RIGHT_PAREN | COMMA | MIDS_PUNCT
   ;
  %%


2. I took the header file that contains the parser class declaration
created by yacc and added a virtual function for each semantic
action. I also added a pointer to the internal representation that can
be accessed by a class derived from yy_parse:


  class yy_parse {


  // Loads of standard yacc declarations removed for brevity


  protected:


  sentence *pPosTxt; // After a successful parse, pPosTxt
     // points to the internal rep of the text


  // The 'virtual semantics' functions
  virtual sentence *PosText(sentence *s);
  virtual sentence *Sentence(sentence *s, word *w);
  virtual sentence *SentenceList(sentence *s1,
   sentence *s2=(sentence*)0);
  virtual sentence *WordList(word *w,
   sentence *s=(sentence*)0);
  };


As a side note on the declarations of the WordList and SentenceList
functions: I am not a big fan of default parameters, but this is a
case where I think they serve a reasonable purpose. If the second
parameter did not default to (sentence*)0, I would have had to write
it into the grammar file. To me, that is just not pretty. The
alternative of passing NULL would mean including stdio.h or declaring
NULL myself. I don't think those options are pretty either. It may
seem trivial, but I like it.


3. I created an implementation file for yy_parse's semantic action
functions. These functions do nothing. The implementation file looks
like this:


  #include "pos.h"
  #include "lex_yy.hpp"
  #include "yparse.hpp"


  sentence *yy_parse::SentenceList(sentence *s1, sentence *s2)
   {return s1;}
  sentence *yy_parse::Sentence(sentence *s, word *w)
{return s;}
  sentence *yy_parse::WordList(word *w, sentence *s)
{return s;}
  sentence *yy_parse::PosText(sentence *s)
{return s;}


At this point, I could compile and link the files created by lex and
yacc, add in the yy_parse actions implementation file and a main
function, and I would have a complete parsing program that only does
syntax checking on a parts of speech tagged file.


4. To do something else with the parser, I derived a class called
SentenceReport from yy_parse:


  class SentenceReport: public yy_parse
  {
   protected:


   sentence *PosText(sentence *s);
   sentence *Sentence(sentence *s, word *w);
sentence *WordList(word *w,
sentence *s=(sentence*)0);
   sentence *SentenceList(sentence *s1,
sentence *s2=(sentence*)0);


   public:


   sentence *PrintReport(sentence *s);
  };


The implementation of SentenceReport builds an internal representation
of the input text. It also adds a function to output the text in a
report format.


That is pretty much it. The reason I like the idea is because it uses
C++ to go one step beyond the modularity that TimD describes in his
original posting. On the other hand, it is somewhat more of an effort
to lay it out this way. Because I have not come across a program or
group of programs that would obviously benefit from it, I tend to
question whether it is worth using in 'real' programs.


Is this taking yacc/C++ style too far? At one point I was considering
hiding the parse tree data structures from the parser. The only way
that I could think of doing that would rely too heavily on void
pointers (at least the way I have it in my mind). In the end, it
seemed that the parser has a right to know the internal representation
of what it is parsing, so I chucked the idea.


Do any of you see any significant gains to be made by handling yacc
action code in this manner?


Just curious,
Bob.


PS: This code was taken from a working program that I put together for
the purpose of demonstrating this technique. If anyone is interested
enough, I would be happy to pass on all of the source code to you.


--


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.