Related articles |
---|
WANTED: stylistic advice/guidelines for YACC and LEX timd@Starbase.NeoSoft.COM (1996-01-27) |
Re: WANTED: stylistic advice/guidelines for YACC and LEX scooter@mccabe.com (Scott Stanchfield) (1996-01-29) |
Re: WANTED: stylistic advice/guidelines for YACC and LEX rmonroe@hever.demon.co.uk (Robert F. Monroe) (1996-01-31) |
Re: WANTED: stylistic advice/guidelines for YACC and LEX rmonroe@hever.demon.co.uk (Robert F. Monroe) (1996-02-02) |
Re: WANTED: stylistic advice/guidelines for YACC and LEX Robert@hever.demon.co.uk (Robert F. Monroe) (1996-02-04) |
Re: WANTED: stylistic advice/guidelines for YACC and LEX rmonroe@hever.demon.co.uk (Robert F. Monroe) (1996-02-09) |
Re: WANTED: stylistic advice/guidelines for YACC and LEX ltribble@msmail4.hac.com (Tribble, Louis) (1996-02-09) |
From: | "Robert F. Monroe" <Robert@hever.demon.co.uk> |
Newsgroups: | comp.compilers |
Date: | 4 Feb 1996 12:41:06 -0500 |
Organization: | Robert F. Monroe |
References: | 96-02-019 96-01-084 96-01-139 |
Keywords: | yacc, lex |
I put together a working example of the semantic actions/virtual
function idea that I was talking about. This interests me and I
believe that it is in keeping with the topic of this thread. I would
like to know if anyone else sees this as a useful technique, or if
they have seen something similar elsewhere. I will keep it as brief as
possible.
A few weeks ago, I was playing with the output of a parts of speech
tagging program. I wrote a very simple yacc parser to read the
tagger's output and create an internal representation of the tagged
text. Nothing too fancy, but it struck me as a good candidate for
demonstrating the use of C++ virtual functions to implement semantic
actions. I originally wrote it in C, so I had to convert it to C++.
This is how it goes:
1. I wrote my yacc spec and the associated lex spec and ran them
through MKS LEX and Yacc using the -LC switch to generate C++ classes
for the scanner and parser. The yacc spec looks like this:
%{
#include "pos.h" // Structures for internal rep of a POS text
#include "lex_yy.hpp" // Lex generated scanner class declaration
#include "yparse.hpp" // Yacc generated parser class declaration,
// revised to include 'virtual semantics'
%}
%union {
word *pw;
sentence *ps;
}
%token <pw> UNKNOWN CC CD DT EX FW IN JJ JJR JJS LS MD NN NNS
%token <pw> NP NPS PDT POS PP PP_DOLLAR RB RBR RBS RP SYM TO UH VB
%token <pw> VBD VBG VBN VBP VBZ WDT WP WP_DOLLAR WRB DBL_Q DOLLAR
%token <pw> HASH_M LEFT_SQ RIGHT_SQ LEFT_DQ RIGHT_DQ
%token <pw> LEFT_PAREN RIGHT_PAREN COMMA FINAL_PUNCT MIDS_PUNCT
%type <pw> word
%type <ps> sentence sentence.list word.list pos.text
%%
pos.text: sentence.list {pPosTxt = PosText($1);}
;
sentence.list: sentence {$$ = SentenceList($1);}
| sentence.list sentence {$$ = SentenceList($1, $2);}
;
sentence: word.list FINAL_PUNCT {$$ = Sentence($1, $2);}
;
word.list: word {$$ = WordList($1);}
| word.list word {$$ = WordList($2, $1);}
;
word:
UNKNOWN | CC | CD | DT | EX | FW | IN | JJ | JJR | JJS | LS |
MD | NN | NNS | NP | NPS | PDT | POS | PP | PP_DOLLAR | RB |
RBR | RBS | RP | SYM | TO | UH | VB | VBD | VBG | VBN | VBP |
VBZ | WDT | WP | WP_DOLLAR | WRB | DBL_Q | DOLLAR |
HASH_M | LEFT_SQ | RIGHT_SQ | LEFT_DQ | RIGHT_DQ |
LEFT_PAREN | RIGHT_PAREN | COMMA | MIDS_PUNCT
;
%%
2. I took the header file that contains the parser class declaration
created by yacc and added a virtual function for each semantic
action. I also added a pointer to the internal representation that can
be accessed by a class derived from yy_parse:
class yy_parse {
// Loads of standard yacc declarations removed for brevity
protected:
sentence *pPosTxt; // After a successful parse, pPosTxt
// points to the internal rep of the text
// The 'virtual semantics' functions
virtual sentence *PosText(sentence *s);
virtual sentence *Sentence(sentence *s, word *w);
virtual sentence *SentenceList(sentence *s1,
sentence *s2=(sentence*)0);
virtual sentence *WordList(word *w,
sentence *s=(sentence*)0);
};
As a side note on the declarations of the WordList and SentenceList
functions: I am not a big fan of default parameters, but this is a
case where I think they serve a reasonable purpose. If the second
parameter did not default to (sentence*)0, I would have had to write
it into the grammar file. To me, that is just not pretty. The
alternative of passing NULL would mean including stdio.h or declaring
NULL myself. I don't think those options are pretty either. It may
seem trivial, but I like it.
3. I created an implementation file for yy_parse's semantic action
functions. These functions do nothing. The implementation file looks
like this:
#include "pos.h"
#include "lex_yy.hpp"
#include "yparse.hpp"
sentence *yy_parse::SentenceList(sentence *s1, sentence *s2)
{return s1;}
sentence *yy_parse::Sentence(sentence *s, word *w)
{return s;}
sentence *yy_parse::WordList(word *w, sentence *s)
{return s;}
sentence *yy_parse::PosText(sentence *s)
{return s;}
At this point, I could compile and link the files created by lex and
yacc, add in the yy_parse actions implementation file and a main
function, and I would have a complete parsing program that only does
syntax checking on a parts of speech tagged file.
4. To do something else with the parser, I derived a class called
SentenceReport from yy_parse:
class SentenceReport: public yy_parse
{
protected:
sentence *PosText(sentence *s);
sentence *Sentence(sentence *s, word *w);
sentence *WordList(word *w,
sentence *s=(sentence*)0);
sentence *SentenceList(sentence *s1,
sentence *s2=(sentence*)0);
public:
sentence *PrintReport(sentence *s);
};
The implementation of SentenceReport builds an internal representation
of the input text. It also adds a function to output the text in a
report format.
That is pretty much it. The reason I like the idea is because it uses
C++ to go one step beyond the modularity that TimD describes in his
original posting. On the other hand, it is somewhat more of an effort
to lay it out this way. Because I have not come across a program or
group of programs that would obviously benefit from it, I tend to
question whether it is worth using in 'real' programs.
Is this taking yacc/C++ style too far? At one point I was considering
hiding the parse tree data structures from the parser. The only way
that I could think of doing that would rely too heavily on void
pointers (at least the way I have it in my mind). In the end, it
seemed that the parser has a right to know the internal representation
of what it is parsing, so I chucked the idea.
Do any of you see any significant gains to be made by handling yacc
action code in this manner?
Just curious,
Bob.
PS: This code was taken from a working program that I put together for
the purpose of demonstrating this technique. If anyone is interested
enough, I would be happy to pass on all of the source code to you.
--
Return to the
comp.compilers page.
Search the
comp.compilers archives again.