Is this grammar possible?

quixote@PrimeNet.Com (Donald A. Hosek)
Sat, 27 May 1995 01:13:58 GMT

          From comp.compilers

Related articles
Is this grammar possible? quixote@PrimeNet.Com (1995-05-27)
| List of all articles for this month |

Newsgroups: comp.compilers
From: quixote@PrimeNet.Com (Donald A. Hosek)
Keywords: yacc, question
Organization: Primenet Services for the Internet (602)395-1010
Date: Sat, 27 May 1995 01:13:58 GMT
Status: RO

The task: Locate bible references in a text stream and turn them into
hyperlinks. As much as possible, I've attempted to minimize distractions
to the parser, but there are occasional bits of difficulty in the new text
stream I'm dealing with.


Previously, I had a bit of leeway because there was always some non-parsed
text separating parsed tokens, which made my life a great deal easier. In
my new context, I don't have that luxury. I've included the full yacc
parser at the end of this message. I'd like to be able to take a piece of
text that tokenizes as, say,
    SEP DASH DASH BOOK FULLREF SEP
and reduce that to
    atom atom bookref
or have
    SEP BOOK FULLREF COMMA SEP
reduce to
    bookref atom
any suggestions? Or do I have to start making larger pieces of text into
SEP tokens?


yacc parser follows:
/*
      yacc code for parsing bible references.


      Bible references can take the following forms:
      BOOK NUM : NUM (book, chapter verse)
      BOOK NUM (book, verse--chapter is implied)
      AND NUM : NUM (chapter verse. use last value of BOOK)
      CHAP NUM : NUM (chapter verse, use current book for BOOK)
      VERSE NUM (verse, use current book and chapter)
*/


/* top user-supplied code */
%{


#include <stdio.h> /* basic i/o routines */
#include "com.parse-refs.h" /* Load global declarations */


/* Temporary holding place for string data */
char tempstr[100],tempstra[100],tempstrb[100],outstr[100];


/* To allow us to communicate with lex more easily */
extern FILE *yyin, *yyout;


/* Maintain the current book, chapter and verse numbers */
int BookNo, ChapNo, VerseNo;


/* Macro to ship text out like lex--one argument which
must be a string */
#define WRITE(s) fprintf(yyout,s)


/* Macro to reset any temporary vars which need resetting */
#define RESET outstr[0]='\0'; \
tempstr[0]='\0'; \
tempstra[0]='\0'; \
tempstrb[0]='\0'


%}




%union{
      int num;
      char *string;
}


/* We will define the tokens we get passed */
%token <string> SEP
%token <num> BOOK
%token <string> BOOK_LIKE
%token <num> VERSEREF FULLREF
%token CHAP VERSE
%token COMMA DASH
%token <string> PAGE VOL


%%
/* Now for production rules */


/* The first rule defines a list of parsable phrases. First are the
      ones that really matter, our assorted references. We finish with
      "atom", which refers to tokens which may have found their way
      to the parser inadvertently and should simply be sent back to
      the output
*/


reflist: SEP
{ WRITE($1); free($1); }
| parsable SEP
{ WRITE($2); free($2); }
| reflist SEP
{ WRITE($2); free($2); }
| reflist parsable SEP
{ WRITE($3); free($3); }
;


parsable: booklikeref | bookref | chapref | verseref | pagequote
|volquote | atom ;


booklikeref: BOOK_LIKE reference
{
sprintf(tempstr,"%s%s",$1,outstr);
free($1);
WRITE(tempstr);
RESET;
}
;


bookref: BOOK reference
{
BookNo=$1;
sprintf(tempstr,
"<blink linkid=%02d%03d%03d>%s%s</blink>",
BookNo,ChapNo,VerseNo,
BookList[$1],outstr);
WRITE(tempstr);
RESET;
}
;


chapref: CHAP reference
{
BookNo=CurrentBook;
sprintf(tempstr,
"<blink linkid=%02d%03d%03d>ch.%s</blink>",
BookNo,ChapNo,VerseNo,
outstr);
WRITE(tempstr);
RESET;
}
;


verseref: VERSE reference
{
sprintf(tempstr,
"<blink linkid=%02d%03d%03d>v.%s</blink>",
BookNo,ChapNo,VerseNo,
outstr);
WRITE(tempstr);
RESET;
}
;


pagequote: PAGE VERSEREF
{ sprintf(tempstr,"p.%d",$2); WRITE(tempstr); }
| PAGE VERSEREF DASH VERSEREF
{ sprintf(tempstr,"p.%d-%d",$2,$4); WRITE(tempstr); }
;


volquote: VOL VERSEREF
{ sprintf(tempstr,"Vol.%d",$2); WRITE(tempstr); }
;


reference: sreference
{ strcpy(outstr,tempstr); }
| sreference ereference
{
strcat(tempstr,outstr);
strcpy(outstr,tempstr);
}
;






sreference: VERSEREF
{
sprintf(tempstr,"%d", $1);
VerseNo=$1;
ChapNo=1; /* Standard default for 1 chapter books */
}
| FULLREF
{
ChapNo=$1 / 1000;
VerseNo=$1 % 1000;
sprintf(tempstr,"%d:%d", ChapNo, $1 % 1000);
}
;


ereference: sereference
| sereference ereference
;


sereference: COMMA eref
{
sprintf(tempstra,", %s",tempstrb);
strcat(outstr,tempstra);
                                                }
| DASH eref
{
sprintf(tempstra,"-%s",tempstrb);
strcat(outstr,tempstra);
                                                }
;


eref: VERSEREF
{ sprintf(tempstrb,"%d", $1); }
| FULLREF
{ sprintf(tempstrb,"%d:%d", $1 / 1000, $1 % 1000); }
;




atom: BOOK { WRITE(BookList[$1]); }
| CHAP { WRITE("ch."); }
| COMMA { WRITE(", "); }
| DASH { WRITE("-"); }
| DASH DASH { WRITE("--"); } /* Will this work? */
| FULLREF {
sprintf(tempstr,"%d:%d", $1 / 1000, $1 % 1000);
WRITE(tempstr);
}
| PAGE { WRITE($1); free($1); }
| VERSE { WRITE("ver."); }
| VERSEREF { sprintf(tempstr,"%d", $1); WRITE(tempstr); }
| VOL { WRITE($1); free($1); }
;


%%


/* User code, including out main routine */


/* The main routine just lexes and parses (as appropriate) stdin via yyin */
main()
{
    yyin=stdin; /* Necessary, but why? */
    while(!feof(yyin)) {
          yyparse();
    }
}




--
Don Hosek dhosek@quixote.com
909-621-1291 Coming soon: the Quixote Digital
FAX: 909-625-1342 Typography WWW site.
Quixote Digital Typography
--


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.