Re: Fast LR algorithms -- yacc open code converter

megatest!djones@decwrl.dec.com (Dave Jones)
17 Oct 91 22:45:58 GMT

          From comp.compilers

Related articles
Fast LR algorithms haertel@euclid.uoregon.edu (1991-10-15)
Re: Fast LR algorithms -- yacc open code converter megatest!djones@decwrl.dec.com (1991-10-17)
| List of all articles for this month |

Newsgroups: comp.compilers
From: megatest!djones@decwrl.dec.com (Dave Jones)
Keywords: parse, yacc, code
Organization: Megatest Corporation, San Jose, Ca
References: 91-10-051
Date: 17 Oct 91 22:45:58 GMT

>From article 91-10-051, by haertel@euclid.uoregon.edu (Mike Haertel):
> I think we should be far more concerned with the speed of the resulting
> parsers, than with the speed with which we can produce parsers.


Any slowness is not due to the difference between LL and LR. Each
algorithm has to do equivalent state changes. Yacc is slow because the
calculation of the next action is done using a lookup-table. It could
instead produce "open" code, sacrificing size to speed. Back when yacc was
written, machines were often 100 times smaller than the typical
workstation today, so they had little option.


The y.output and y.tab.c files contain enough info to produce an "open
code" parser, however. So I wrote a program to do that. It is several
times larger, but also several times faster, perhaps ten to fifteen time
faster, than yacc's yyparse(). An added benefit is that there is no need
for dumping debug info to trace the state-changes. You can use a standard
debugger to step through the switches. The C-code looks much like the
y.output file.


Several people have written requesting the program. So here it is. It
comes without any warranty of any kind of course. Use at your own risk. I
hope someone will verify the parser's algorithm for correctness. If you
find any bugs, please let me know.


The following seems to work under BSD Unix 4.2, and Sun OS 3.5 and 4.1.
[Doesn't work on my SVR3.2, but the changes look straightforward. -John]


^^^^ snip ^^^^^ snip ^^^^ snip ^^^^^ snip ^^^^ snip ^^^^^ snip ^^^^ snip
#! /bin/sh
# This is a shell archive. Remove anything before this line, then unpack
# it by saving it into a file and typing "sh file". To overwrite existing
# files, type "sh file -c". You can also feed this as standard input via
# unshar, or by typing "sh <file", e.g.. If this archive is complete, you
# will see the following message at the end:
# "End of shell archive."
# Contents: INSTALL README assemble.nawk makefile ynot-template ynot.c
# ynot.nawk yyparse.head.nawk yyparse.tail.nawk
# Wrapped by djones@prometheus on Thu Oct 17 15:27:47 1991
PATH=/bin:/usr/bin:/usr/ucb ; export PATH
if test -f INSTALL -a "${1}" != "-c" ; then
    echo shar: Will not over-write existing file \"INSTALL\"
else
echo shar: Extracting \"INSTALL\" \(422 characters\)
sed "s/^X//" >INSTALL <<'END_OF_INSTALL'
X
XTo install ynot...
X
X1. Change the name LIB in the makefile to the complete pathname of
Xthe library which will contain the ynot .nawk files, et cetera.
XChange the name BIN to the name of the directory with will contain
Xthe executable program.
X
X2. If the LIB directory is this local one, execute "make ynot",
Xthen copy the executatable "ynot" to the BIN directory.
XIf the LIB directory is elsewhere, execute "make install".
END_OF_INSTALL
echo shar: Missing newline added to \"INSTALL\"
if test 422 -ne `wc -c <INSTALL`; then
        echo shar: \"INSTALL\" unpacked with wrong size!
fi
# end of overwriting check
fi
if test -f README -a "${1}" != "-c" ; then
    echo shar: Will not over-write existing file \"README\"
else
echo shar: Extracting \"README\" \(1419 characters\)
sed "s/^X//" >README <<'END_OF_README'
X
XThis program filters yacc's output files to generate a parser that uses
Xswitch-statements rather than coded tables. The result is a parser that
Xis potentially much larger, but also much faster than a standard yacc
Xparser. (Approximately twelve times larger, and fifteen times faster on
Xthe machines where I have tested it.) Additionally, a debugger can be used
Xto step through the state changes symbolically, which is a big help in
Xdebugging grammars.
X
XThere are two big switches, one for shifts, gotos, and the accept-action,
Xthe other for reduce-actions. The switches are annotated with comments
Xshowing their dotted-rule sets, taken from yacc's y.output file.
XThe actions are coded with the macros YSHIFT, YGOTO, and YACCEPT, so
Xthat stepping through the switch looks just like stepping through a
Xsymbolic schematic of the state-machine. Additionally there are
Xmacros YSHIFT_ERROR and YERROR for yacc-style error-recovery. The
Xreduce-rule switch uses the macro YREDUCE.
X
XThe processing is done by three NAWK (new AWK) scripts, which are
Xcontrolled by the C program ynot.c.
X
XThe program assumes quite a lot about the format of the y.output file
Xand the y.tab.c file. It works with BSD yacc and the yacc that was
Xfurnished with Sun 3.5 and Sun 4.1 OS's.
X
XCAVEAT: Some brain-damaged compilers generate bad code for big switches
Xwithout warning. (On old Unix systems, check the man-page for cc, and
Xlook for a -J flag.)
END_OF_README
echo shar: Missing newline added to \"README\"
if test 1419 -ne `wc -c <README`; then
        echo shar: \"README\" unpacked with wrong size!
fi
# end of overwriting check
fi
if test -f assemble.nawk -a "${1}" != "-c" ; then
    echo shar: Will not over-write existing file \"assemble.nawk\"
else
echo shar: Extracting \"assemble.nawk\" \(142 characters\)
sed "s/^X//" >assemble.nawk <<'END_OF_assemble.nawk'
X#
X# This does one level of "include-file" processing for
X# the file ynot-template.
X
X/^INCLUDE/ {
X system( "cat " $2 );
X next;
X}
X
X{ print $0 }
END_OF_assemble.nawk
if test 142 -ne `wc -c <assemble.nawk`; then
        echo shar: \"assemble.nawk\" unpacked with wrong size!
fi
# end of overwriting check
fi
if test -f makefile -a "${1}" != "-c" ; then
    echo shar: Will not over-write existing file \"makefile\"
else
echo shar: Extracting \"makefile\" \(329 characters\)
sed "s/^X//" >makefile <<'END_OF_makefile'
X
X# Change the following "." to the actual pathname of the
X# ynot directory.
X
XLIB = "."
X
X# Change the following "" to the actual pathname of the
X# ynot bin directory.
X
XBIN = "."
X
Xynot: ynot.c
X cc -g '-DLIB=$(LIB)' ynot.c -o ynot
X
Xinstall: ynot
X cp ynot $(BIN)
X cp ynot-template ynot.nawk yyparse.head.nawk ynot.tail.nawk $(LIB)
END_OF_makefile
if test 329 -ne `wc -c <makefile`; then
        echo shar: \"makefile\" unpacked with wrong size!
fi
# end of overwriting check
fi
if test -f ynot-template -a "${1}" != "-c" ; then
    echo shar: Will not over-write existing file \"ynot-template\"
else
echo shar: Extracting \"ynot-template\" \(2355 characters\)
sed "s/^X//" >ynot-template <<'END_OF_ynot-template'
XINCLUDE ynot.top.i
XINCLUDE ynot.tab.h
X
X#define YGOTO(num) state=num; goto Push;
X#define YSHIFT(num) state=(num);goto Shift;
X#define YSHIFT_ERROR(num) state=num; goto Shift_error;
X#define YREDUCE(rule_num,len,lhs) rule=rule_num;symbol=lhs; length= len; goto Reduce;
X#define YERROR() goto error_recover;
X#define YACCEPT() return(0);
X
X#define yyerrok shift_more = 0
X
X#ifndef YYMAXDEPTH
X#define YYMAXDEPTH 100
X#endif YYMAXDEPTH
X
XYYSTYPE yyval, yylval;
XYYSTYPE v_stack[YYMAXDEPTH]; /* value-stack */
Xint s_stack[YYMAXDEPTH]; /* state-stack */
Xint yychar;
X
Xyyparse()
X{
X int shift_more = 0;
X register int symbol = 0;
X YYSTYPE value;
X register int state = 0;
X int rule;
X register int length;
X
X register YYSTYPE *yypvt = &v_stack[0];
X register int *yypst = &s_stack[0];
X
X yychar = -1;
X
X Shift: /* Get a token to "shift". */
X
X if(shift_more) {
X if(yychar == error)
X shift_more = 3;
X else
X shift_more--;
X }
X
X value = yylval;
X yychar = yylex();
X
X Push:
X
X /* Push the symbol (token or non-terminal) onto the stack. */
X
X if(++yypst > &s_stack[YYMAXDEPTH]) {
X yyerror("Parser stack overflow");
X return 1;
X }
X
X *yypst=state;
X *++yypvt=value;
X
X Next: /* Go to the next state, either by shift or reduce action. */
X
X switch(state){
XINCLUDE ynot.states.i
X };
X
X Reduce: /* Pop the stack and do a "goto" */
X
X /* Don't do stack-reductions during error recovery
X * until after "error" has been shifted. Discard from the stack instead.
X */
X if(shift_more == 4 ) {
X /* Pop the stack rather than doing reduction.
X */
X yychar = error;
X goto error_recover;
X }
X
X yyval = yypvt[1-length];
X
X switch(rule){
XINCLUDE ynot.actions.i
X ;
X
X value = yyval;
X yypvt -= length;
X yypst -= length;
X state = *yypst;
X
X switch(state){
XINCLUDE ynot.gotos.i
X };
X
X error_recover:
X
X if(yychar== T_EOF) {
X yyerror("Unexpected end of file");
X return -1;
X }
X
X if(shift_more==0) {
X yyerror("Syntax error");
X }
X
X if(yychar == error) {
X /* Cannot use "error" in this state.
X * Pop the state from the stack.
X */
X yypvt--;
X yypst--;
X if(yypst == &s_stack[0])
X return -1;
X state = *yypst;
X }
X else {
X yychar = error;
X }
X
X /* Shift error yychar and three good yychars
X * before reporting another syntax-error.
X */
X shift_more = 4;
X goto Next;
X
X}
END_OF_ynot-template
if test 2355 -ne `wc -c <ynot-template`; then
        echo shar: \"ynot-template\" unpacked with wrong size!
fi
# end of overwriting check
fi
if test -f ynot.c -a "${1}" != "-c" ; then
    echo shar: Will not over-write existing file \"ynot.c\"
else
echo shar: Extracting \"ynot.c\" \(2134 characters\)
sed "s/^X//" >ynot.c <<'END_OF_ynot.c'
X#include <stdio.h>
Xextern char* malloc();
X
X/*
X * This program executes yacc, and then the various NAWK scripts
X * which build the switch-statement version of yyparse().
X */
X
Xstatic char* command_line;
X
X
Xstatic void
Xclean_up()
X{
X system("rm -f ynot.tab.h ynot.states.i ynot.gotos.i ynot.top.i ynot.actions.i" );
X}
X
X
Xstatic void
Xdo_command()
X{
X fprintf(stderr, "%s\n", command_line);
X if( 0 != system(command_line)) {
X clean_up();
X system("rm -f y.tab.c y.output y.tab.h");
X exit(-1);
X }
X}
X
Xstatic void
Xremove(name)
X char* name;
X{
X sprintf(command_line, "rm -f %s", name);
X system(command_line);
X}
X
Xmain(argc, argv)
X char** argv;
X{
X
X char* file_name;
X int v_flag = 0; /* preserve the y.output file */
X int d_flag = 0; /* preserve the y.tab.h file */
X
X argc--; argv++;
X
X while(argc > 0 && *argv[0] == '-') {
X
X int i;
X for(i=1; argv[0][i] != 0; i++) {
X if(argv[0][i] == 'v')
X v_flag = 1;
X else
X if(argv[0][i] == 'd')
X d_flag = 1;
X }
X argc--; argv++;
X }
X
X if(argc != 1) {
X fprintf(stderr, "ynot requires exactly one file-name argument.\n");
X exit(-1);
X }
X
X file_name = argv[0];
X command_line = malloc(strlen(file_name) + strlen(LIB) + 200);
X
X /* Do yacc */
X sprintf(command_line, "yacc -vd %s", file_name);
X do_command();
X
X /* create symbols, shifts, gotos from y.output */
X sprintf(command_line, "nawk -f %s/ynot.nawk symbols=ynot.tab.h states=ynot.states.i gotos=ynot.gotos.i y.output;", LIB);
X
X do_command();
X
X /* create top.i from y.tab.c */
X sprintf(command_line, "nawk -f %s/yyparse.head.nawk y.tab.c > ynot.top.i", LIB);
X do_command();
X
X /* ditto for ynot.actions.i */
X sprintf(command_line, "nawk -f %s/yyparse.tail.nawk y.tab.c > ynot.actions.i", LIB);
X do_command();
X
X /* assemble the parts */
X sprintf(command_line, "rm y.tab.c");
X do_command();
X
X sprintf(command_line, "nawk -f %s/assemble.nawk %s/ynot-template > y.tab.c", LIB, LIB);
X do_command();
X
X if(!v_flag)
X remove("y.output");
X
X if(!d_flag)
X remove("y.tab.h");
X
X clean_up();
X
X exit(0);
X}
X
END_OF_ynot.c
if test 2134 -ne `wc -c <ynot.c`; then
        echo shar: \"ynot.c\" unpacked with wrong size!
fi
# end of overwriting check
fi
if test -f ynot.nawk -a "${1}" != "-c" ; then
    echo shar: Will not over-write existing file \"ynot.nawk\"
else
echo shar: Extracting \"ynot.nawk\" \(3586 characters\)
sed "s/^X//" >ynot.nawk <<'END_OF_ynot.nawk'
X
X# This nawk script (new awk) parses a y.output file from yacc,
X# and produces a C function yyparse() that is based on switch-statements
X# rather than coded tables.
X
XBEGIN {
X # For compatibility with yacc, -1 is a reserved token-number,
X # and 0 means end-of-file. We will use -2 as the error-token.
X # Therefore the first user-defined symbol will be -3.
X
X next_symbol = -3
X state_num = -1
X
X print "#define error (-2)" > symbols
X print "#define T_EOF (0)" > symbols
X
X }
X
X/^[0-9]+\/[0-9]+ terminals,/ { exit }
X
X/\_ \([0-9]+\)$/ {
X
X # Line contains a dotted rule (with dot at end)
X # and the rule number ...
X # Use it to give numbers to non-terminals, and lengths to
X # rules (productions).
X
X comment()
X
X # get the rule_num
X fields = split($0, a, "(")
X split(a[fields], n, ")")
X rule_num = n[1]
X
X # Since terminal symbols have small positive numbers,
X # we'll use negative numbers for nonterminals.
X #
X # Array number will map nonterminal names to numbers.
X # Array rule_len will map rule-numbers to RHS lengths.
X # Array rule_lhs will map rule-numbers to LHS nt-names.
X
X name = $1
X
X nt_number = sym_number(name);
X
X # We have to figure out how long RHS of rule is.
X # First, erase the LHS
X rhs = substr($0,index($0,":")+1)
X
X # Erase the dot and rule number
X sub(/\_ \([0-9]+\)/, "",rhs)
X
X rule_len[rule_num] = split(rhs, nevdull, " ")
X rule_lhs[rule_num] = nt_name(name)
X next
X }
X
X/^\t\$*[a-zA-Z0-9_]+ :/{
X comment()
X next
X}
X
X$0 == "" {
X if(in_comment){
X in_comment = 0
X print " */" > states
X print " */" > gotos
X }
X next
X}
X
X/^state/ {
X if(state_num != -1){
X end_state();
X }
X
X state_num = $2;
X
X print "" > states
X print " case " state_num ":switch(yychar){" > states
X print " /*" > states
X
X print "" > gotos
X print " case " state_num ":switch(symbol){" > gotos
X print " /*" > gotos
X
X in_comment = 1
X
X next
X }
X
X$2 == "goto" {
X print " case " nt_name($1) ": YGOTO(" $3 ")" > gotos
X }
X
X$2 == "reduce" {
X if($1=="." ){
X if(default_rule != "")
X {
X rn = default_rule;
X print " case '.': YREDUCE(" rn "," rule_len[rn] "," rule_lhs[rn] ")" > states
X }
X default_rule = $3
X }
X else
X print " case " token($1) ": YREDUCE(" $3 "," rule_len[$3] "," rule_lhs[$3] ")" > states
X next
X }
X
X$2 == "shift" {
X print " case " token($1) ": YSHIFT(" $3 ")" > states
X next
X }
X
X$2 == "accept" {
X print " case " token($1) ": YACCEPT()" > states
X next
X}
X
X
XEND {
X end_state()
X
X
X}
X
Xfunction end_state() {
X
X if(default_rule == ""){
X print " default: YERROR()" > states
X }else{
X rule_num = default_rule
X print " default: YREDUCE(" rule_num "," rule_len[rule_num] "," rule_lhs[rule_num] ")" > states
X }
X
X print " };" > states
X print " };" > gotos
X default_rule = ""
X
X}
X
Xfunction token(t){
X if(length(t) == 1 || substr(t,1,1) == "\\")
X return "'" t "'";
X else if (t== "$end")
X return "T_EOF"
X else
X return t;
X}
X
Xfunction empty_array(a){
X for ( e in a )
X delete a[e]
X}
X
Xfunction sym_number(name){
X nt_number = number[name];
X
X if(nt_number == "") {
X nt_number = number[name] = next_symbol--
X print "#define " nt_name(name) " (" nt_number ")" > symbols
X }
X
X return nt_number
X}
X
Xfunction nt_name(name){
X
X if(substr(name, 1, 2) == "$$")
X return "NT_" substr(name, 3 )
X
X return "NT_" name
X}
X
Xfunction comment(){
X if(!in_comment) {
X print " /*" > states
X print " /*" > gotos
X }
X in_comment = 1
X print " * " $0 > states
X print " * " $0 > gotos
X}
END_OF_ynot.nawk
if test 3586 -ne `wc -c <ynot.nawk`; then
        echo shar: \"ynot.nawk\" unpacked with wrong size!
fi
# end of overwriting check
fi
if test -f yyparse.head.nawk -a "${1}" != "-c" ; then
    echo shar: Will not over-write existing file \"yyparse.head.nawk\"
else
echo shar: Extracting \"yyparse.head.nawk\" \(297 characters\)
sed "s/^X//" >yyparse.head.nawk <<'END_OF_yyparse.head.nawk'
X# This takes the top of y.tab.c, which contains the
X# #defines for YYSTYPE and the terminal symbols,
X# and writes it to the output. Assumes that the #defines end
X# with "yyclearin", which we do not use.
X
X/^#define yyclearin yychar = -1/ { exit }
X
X/^# line/ { next }
X
X { print }
END_OF_yyparse.head.nawk
if test 297 -ne `wc -c <yyparse.head.nawk`; then
        echo shar: \"yyparse.head.nawk\" unpacked with wrong size!
fi
# end of overwriting check
fi
if test -f yyparse.tail.nawk -a "${1}" != "-c" ; then
    echo shar: Will not over-write existing file \"yyparse.tail.nawk\"
else
echo shar: Extracting \"yyparse.tail.nawk\" \(466 characters\)
sed "s/^X//" >yyparse.tail.nawk <<'END_OF_yyparse.tail.nawk'
X#
X# This copies the end of y.tab.c, which may contain user-defined
X# C-code produced from the actions in the grammar file (.y file).
X#
X# Assumes that the start of the actions is the first occurance of
X# "case" in the y.tab.c file, and that the actions end with the
X# comment, "stack new state and value".
X#
X/^case/ {
X print $0
X tail = 1
X next
X}
X
X/\/\* stack new state and value \*\// { exit }
X
X
X {
X if(tail) {
X print $0
X }
X next
X }
END_OF_yyparse.tail.nawk
if test 466 -ne `wc -c <yyparse.tail.nawk`; then
        echo shar: \"yyparse.tail.nawk\" unpacked with wrong size!
fi
# end of overwriting check
fi
echo shar: End of shell archive.
exit 0
--


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.