Related articles |
---|
RDP is now available for FTP adrian@dcs.rhbnc.ac.uk (1994-02-20) |
Re: RDP is now available for FTP rfg@netcom.com (1994-02-24) |
Newsgroups: | comp.compilers |
From: | adrian@dcs.rhbnc.ac.uk |
Keywords: | tools, available, FTP |
Organization: | Compilers Central |
Date: | Sun, 20 Feb 1994 16:42:47 GMT |
A couple of weeks ago I posted a message about RDP, my recursive descent
parser generator. There's been rather a lot of interest so I've polished
it up into a first release version which is now available for anonymous
ftp. The rest of this rather long message describes the tool and how to
install it on MS-DOS and Unix. RDP is completely public domain, but I
would appreciate some email if you decide to use: this helps me justify my
existence.
Adrian Johnstone, adrian@dcs.rhbnc.ac.uk
** RDP release version 1.00 flyer. Adrian Johnstone 16 February 1994 **
RDP - a recursive descent parser generator
RDP compiles attributed LL(1) grammars decorated with C-language semantic
actions into recursive descent compilers. RDP is written in strict ANSI C
and produces strict ANSI C. RDP was developed using Borland C 3.1 on a PC
and has also been built with no problems on Alpha/OSF-1, DECstation/Ultrix
and Sun 4/SunOS hosts. The definition of strict ANSI here means compiling
with gcc -Wall -pedantic -ANSI and getting no warning messages.
RDP itself and the language processors it generates use standard symbol,
set and scanner modules. The RDP scanner is programmed by loading tokens
into the symbol table at the start of each run.
RDP produces complete runnable programs with built-in help information and
command line switches that are specified as part of the EBNF file. In this
sense RDP output is far more shrink-wrapped than the usual parser
generators which helps beginning students. RDP can generate itself which
is a nice demonstration of the bootstrapping technique used for porting
compilers to new architectures.
I wrote RDP because in October I start giving a course on compiler design.
I don't think the world needs another course on parsing techniques and am
really interested in code generation for exotic processors, so I tried to
produce a compact parser generator that would enable undergraduates to rip
through the syntax and standard code generation parts of the syllabus in a
few weeks, thus leaving me time to get into the black arts of code
scheduling and optimisation.
What you get.
- - An implementation of Pascal style set-handling in C.
- - A hash-coded symbol table with support for scoping and arbitrary user data
- - A programmable high-speed scanner with integrated error handling and
hooks for macros (RDP is being used to generate assemblers for two novel
microprocessors in addition to the usual applications of LL(1) parsers).
- - The source to a hand-coded version of the RDP translator. RDP checks that
the source grammar is LL(1) and explains exactly why a non-LL(1) grammar
is unacceptable. RDP does not attempt to rework a grammar by itself.
- - A decorated EBNF file describing RDP that may be processed by the
hand-coded RDP translator to produce a machine generated version of RDP.
This is good for boggling undergraduate's minds with.
- - A decorated EBNF file describing an interpreter for a language called TOY.
- - An EBNF file describing Pascal which can be used to generate a parser for
the Pascal language. This description needs some work: it is overstrict
with respect to semicolons and does not presently handle quites within
strings.
- - A pre-built RDP executable for MS-DOS.
- - Sources, makefiles and Borland 3.1 project files for everything which
you may use freely on condition that you send copies of any modifications,
enhancements and ill-conceived changes you might make back to me so that
I can improve RDP.
What you don't get.
- - Decent manuals (coming Real Soon Now). When the manuals arrive they will
be called rdp.ps which is a user manual for RDP proper, and rdp_supp.ps
which is a user manual for the set, symbol and scanner modules. If you
get a late version of this distribution you might be lucky and find them
in the kit.
- - Tutorial information on parsing and compiling (try Wirth's Algorithms +
Data Structures = Programs, Chapter 5 or your favourite compiler book).
- - Warranties. (This code has only just escaped from my personal toolkit.
I've put a lot of effort into making it fit for others to use, but in
the very nature of compiler compilers it is hard to test all the angles,
and the Garbage In - Garbage Out principle holds to the highest degree:
if you write ill-formed semantic actions you won't find out until you try
and compile the parser that RDP wrote for you.)
What I'd like.
- - Guinea pigs: I'm sure there are some bad surprises waiting for me, and I'd
like to find them before October so that my course doesn't get torpedoed.
- - A C grammar. The supplied Pascal grammar was produced by retrieving the
yacc Pascal description floating around on the net and hacking it into RDP
source form (which shortens it a great deal BTW). An obvious first
project is to do the same for the C grammar and maybe produce a
pretty-printer based on it.
RDP source language.
- - The RDP source language is a very conventional EBNF supplemented with:
a few pre-defined tokens such as ID (an alphanumeric identifier),
NEW_ID (an alphanumeric identifier that has not yet been added to
the symbol table), EOF and EOLN,
some programmable tokens such as STRING and STRING_ESC which define
Pascal and C style strings respectively,
a set of directives which are mainly used to programme the scanner
and the help subsystems
a weird construct < sequence > 'token' which is shorthand for
sequence {'token' sequence}
attributes of the form :identifier which may be attached to production
names or tokens where they define the type of variable returned on the
LHS and the name of the receiving variable on the RHS of an ::=
arbitrary C code semantic actions embedded in double quotes. These are
simply copied into the generated parser.
- - Here's the RDP grammar written in terms of itself. Tokens are enclosed
in single quotes, directives and built-in tokens are written in upper case
and comments are denoted by (* ... *). RDP is case sensitive.
(* RDP source grammar *)
unit ::= { prod | dir} EOF. (* Zero or more productions or directives *)
prod ::= (ID|NEW_ID) [':'(ID|NEW_ID) {'*'} ] '::=' alt '.' .
alt ::= < seq > '|' . (* Alternates are separated by | *)
seq ::= { (item_ret [':' (ID | NEW_ID) ] | item_inl) }.
item_ret ::= ID | NEW_ID | (* Production *)
token | (* Token *)
(* Following four are parameterisble tokens for handling strings
and comments *)
'STRING' '(' token ')' | (* Pascal style string *)
'STRING_ESC' '(' token token')' | (* C-style string *)
'COMMENT' '(' token token ')' | (* Non-nested comment *)
'COMMENT_NEST' '(' token token ')'. (* Nestable comment *)
item_inl ::= code | (* semantic action *)
'(' alt ')' | (* do first *)
'{' alt '}' | (* zero or more *)
'[' alt ']' | (* zero or one *)
'<' alt '>' token . (* list: <X>'t' expands to X {'t' X} *)
code ::= STRING_ESC('"' '\\'). (* Inline code is delimited by ".." *)
token ::= STRING_ESC('\'' '\\'). (* Tokens are delimited by '..' *)
comment ::= COMMENT_NEST('(*' '*)'). (* Nestable comments are delimited by (*..*) *)
dir ::= 'INCLUDE' '(' code ')' | (* Does for RDP what #include does for C *)
'HELP' '(' code code ')' | (* Add help line and command line switch *)
'USES' '(' code ')' | (* Adds a #include to the generated parser *)
'TITLE' '(' code ')' | (* Descriptive programme title for help *)
'SUFFIX' '(' code ')' | (* Defalt file suffix for scanner input *)
'PRE_PROCESS' '(' code ')' | (* Function to call before activating
parser *)
'POST_PROCESS' '(' code ')' | (* Function to call after activating
parser *)
'OUTPUT_FILE' '(' code ')' | (* Default output file name *)
'SET_SIZE' '(' NUMBER ')' | (* Set maximum set size
(must be >= # of tokens *)
'HASH_SIZE' '(' NUMBER ')' | (* Number of hash buckets in hash table *)
'HASH_PRIME' '(' NUMBER ')' | (* Hash key: see dragon book on hash
functions *)
'TAB_WIDTH' '(' NUMBER ')' | (* Tab expansion width in listings *)
'TEXT_SIZE' '(' NUMBER ')' | (* Default size of text buffer *)
'MAX_ERRORS' '(' NUMBER ')' | (* Stop after this many errors *)
'MAX_WARNINGS' '(' NUMBER ')' | (* Stop after this many warnings *)
'CASE_INSENSITIVE' | (* Make scanner case insensitive (cf Pascal) *)
'SHOW_SKIPS' | (* Issue 'Skipping...' informational messges *)
'NEWLINE_VISIBLE' | (* Make newlines visible
(cf Ada comments, assemblers) *)
'COMMENTS_VISIBLE'. (* Return comments to parser (cf assemblers) *)
(* End of grammar *)
How to get RDP.
- - Use anonymous ftp to cscx.dcs.rhbnc.ac.uk (134.219.200.45) and download
pub/rdp/rdp.tar.Z if you are a Unix user or pub/rdp/rdp.zip if you are an
MS-DOS user. Actually both files contain exactly the same distribution kit,
so you will find an MS-DOS executable in the tar file.
- - If you only have email access to the Internet, try one of the email based
ftp servers such as @decwrl.
- - If you can't manage ftp then I am prepared to send out 3.5in 1.4M disks
for MS-DOS. If you're really desperate I can supply just any other medium
that PC's, Amigas, Atari-ST's, Mac's or DEC computers have ever used, but
I'm very short of time so strange requests may take a while to service.
Other machines.
- - An ex-student of mine is going to try and build RDP for an Amiga. IMHO
anybody with an ANSI C compiler should have almost no problems building
from the standard kit. On the other hand K&R compiler users may suffer.
Installation.
1 Unpack the distribution kit into a single directory. You should have the
following files:
buildlog.dos A log of Borland make and Borland C 3.1 building for DOS
buildlog.unx A log of OSF/1 make and GNU CC for Alpha building for Unix
crash.c Fatal error handling and memory allocation
crash.h
makefile Customisable make file for Unix and MS-DOS Borland C 3.1
pascal.bnf Pascal grammar for building a Pascal parser
rdp.bnf Decorated RDP grammar for building machine generated RDP
rdp.c Hand written RDP front end
rdp.h
rdp.doc This file
rdp_aux.c Auxilliary file containing RDP semantic actions
rdp_aux.h
scan.c A programmable scanner
scan.h
set.c An implmentation of Pascal style sets
set.h
symbol.c A hash code symbol table
symbol.h
test.pas A piece of Pascal for testing the Pascal parser
test.toy A piece of Toy for testing the Toy parser
toy.bnf Decorated Toy grammar for building Toy interpreter
2 The makefile can be used on Unix or on a PC, but you should edit it to
make sure that it is configured for your operating system. Select one of
the two blocks of macro definitons at the top of the file.
3 To build everything, just type 'make'. The default target in the makefile
builds rdp, the Toy interpreter and the Pascal parser. The Toy interpreter
is run on test.toy, and the Pascal interpreter is run on test.pas. Neither
should generate any errors. On successful completion, make uses rdp to
build rdp_gen, a machine generated version of rdp and then uses rdp_gen to
reproduce itself. The two machine generated versions are compared with each
other to make sure that the bootstrap has been successful. They should
differ only in the program names.
The following messages appearing during the build are normal:
During compilation of toy.c
Warning toy.c 732: 'base' is assigned a value that is never used in function main
Warning toy.c 732: 'force' is assigned a value that is never used in function main
During the run of toy on test.toy
Toy V1.00 Feb 18 1994 20:07:04
test.toy: toy interpreter OK
Text mode: 0 errors, 0 warnings
During generation of pascal.c the dangling else causes
Error: ** Production 'statement__9' contains null but first /\ follow is not empty
During compilation of pascal.c
Warning pascal.c 1820: 'base' is assigned a value that is never used in function main
Warning pascal.c 1820: 'force' is assigned a value that is never used in function main
During the run of pascal on test.pas
Pascal parser V1.00 (generated) Feb 18 1994 20:07:10
test.pas: 0 errors, 0 warnings
During the comparison of the machine generated rdp versions
#include "rdp_genA.h"
#include "rdp_genB.h"
"Usage: rdp_genA [options] sourcefile \n\n"
"Usage: rdp_genB [options] sourcefile \n\n"
The buildlog.* files show a complete log for Unix and MS-DOS: if your
results are different then please let me know.
4 If you type 'make clean' all the object files and the machine generated
rdp versions will be deleted, leaving the distribution files plus the new
executables.
5 If you type 'make veryclean' then the directory is cleaned and the
executables are also deleted.
Missing bits to be fixed in version 1.1
There are a couple of things that I managed to break whilst preparing this
version. I've run out of time to work on RDP so I'll just note them here and
put out a maintenance release soon, maybe at the end of February 1994.
- - Presently the RDP text mode scanner is line oriented, so I have removed the
interpretation of while loops from Toy because they break if the while
statement is the first thing on the line. This is top priority to be fixed:
look out for version 1.1
- - Late on, I introduced a stupid bug into the handling of comment and string
delimiters so thatthey must be two characters long to work. Hence /* .. */
is fine but { } is not. Hence the Pascal parser does not parse comments. This
will also be fixed in version 1.1.
- - Due to a subtlety of the order in which sequences are evaluated, rdp_gen
and its first child differ, so I need to generate rdp_gen's child and
grandchild to do the convergence check in the makefile.
Enhancements for version 2
- - RDP parsers have large numbers of constant first and follow sets embedded
in them. For some applications, especially assemblers, many of these sets
are identical. In version 2 identical sets will be collapsed. This is only
likely to save a few kilobytes in even the most pathological parser, but it
offends my sense of neatness.
- - Macro handling code will be incorporated directly into the scanner. At the
moment I do macros within my assemblers (which I'm not releasing for
public consumption just yet).
Comments, queries and requests for code to
Dr Adrian Johnstone, CS Dept, Royal Holloway, University of London
Email: adrian@dcs.rhbnc.ac.uk
--
Return to the
comp.compilers page.
Search the
comp.compilers archives again.