RDP is now available for FTP

adrian@dcs.rhbnc.ac.uk
Sun, 20 Feb 1994 16:42:47 GMT

          From comp.compilers

Related articles
RDP is now available for FTP adrian@dcs.rhbnc.ac.uk (1994-02-20)
Re: RDP is now available for FTP rfg@netcom.com (1994-02-24)
| List of all articles for this month |
Newsgroups: comp.compilers
From: adrian@dcs.rhbnc.ac.uk
Keywords: tools, available, FTP
Organization: Compilers Central
Date: Sun, 20 Feb 1994 16:42:47 GMT

A couple of weeks ago I posted a message about RDP, my recursive descent
parser generator. There's been rather a lot of interest so I've polished
it up into a first release version which is now available for anonymous
ftp. The rest of this rather long message describes the tool and how to
install it on MS-DOS and Unix. RDP is completely public domain, but I
would appreciate some email if you decide to use: this helps me justify my
existence.


Adrian Johnstone, adrian@dcs.rhbnc.ac.uk




** RDP release version 1.00 flyer. Adrian Johnstone 16 February 1994 **


RDP - a recursive descent parser generator


RDP compiles attributed LL(1) grammars decorated with C-language semantic
actions into recursive descent compilers. RDP is written in strict ANSI C
and produces strict ANSI C. RDP was developed using Borland C 3.1 on a PC
and has also been built with no problems on Alpha/OSF-1, DECstation/Ultrix
and Sun 4/SunOS hosts. The definition of strict ANSI here means compiling
with gcc -Wall -pedantic -ANSI and getting no warning messages.


RDP itself and the language processors it generates use standard symbol,
set and scanner modules. The RDP scanner is programmed by loading tokens
into the symbol table at the start of each run.


RDP produces complete runnable programs with built-in help information and
command line switches that are specified as part of the EBNF file. In this
sense RDP output is far more shrink-wrapped than the usual parser
generators which helps beginning students. RDP can generate itself which
is a nice demonstration of the bootstrapping technique used for porting
compilers to new architectures.


I wrote RDP because in October I start giving a course on compiler design.
I don't think the world needs another course on parsing techniques and am
really interested in code generation for exotic processors, so I tried to
produce a compact parser generator that would enable undergraduates to rip
through the syntax and standard code generation parts of the syllabus in a
few weeks, thus leaving me time to get into the black arts of code
scheduling and optimisation.


What you get.


- - An implementation of Pascal style set-handling in C.


- - A hash-coded symbol table with support for scoping and arbitrary user data


- - A programmable high-speed scanner with integrated error handling and
    hooks for macros (RDP is being used to generate assemblers for two novel
    microprocessors in addition to the usual applications of LL(1) parsers).


- - The source to a hand-coded version of the RDP translator. RDP checks that
    the source grammar is LL(1) and explains exactly why a non-LL(1) grammar
    is unacceptable. RDP does not attempt to rework a grammar by itself.


- - A decorated EBNF file describing RDP that may be processed by the
    hand-coded RDP translator to produce a machine generated version of RDP.
    This is good for boggling undergraduate's minds with.


- - A decorated EBNF file describing an interpreter for a language called TOY.


- - An EBNF file describing Pascal which can be used to generate a parser for
    the Pascal language. This description needs some work: it is overstrict
    with respect to semicolons and does not presently handle quites within
    strings.


- - A pre-built RDP executable for MS-DOS.


- - Sources, makefiles and Borland 3.1 project files for everything which
    you may use freely on condition that you send copies of any modifications,
    enhancements and ill-conceived changes you might make back to me so that
    I can improve RDP.


What you don't get.


- - Decent manuals (coming Real Soon Now). When the manuals arrive they will
    be called rdp.ps which is a user manual for RDP proper, and rdp_supp.ps
    which is a user manual for the set, symbol and scanner modules. If you
    get a late version of this distribution you might be lucky and find them
    in the kit.


- - Tutorial information on parsing and compiling (try Wirth's Algorithms +
    Data Structures = Programs, Chapter 5 or your favourite compiler book).


- - Warranties. (This code has only just escaped from my personal toolkit.
    I've put a lot of effort into making it fit for others to use, but in
    the very nature of compiler compilers it is hard to test all the angles,
    and the Garbage In - Garbage Out principle holds to the highest degree:
    if you write ill-formed semantic actions you won't find out until you try
    and compile the parser that RDP wrote for you.)


What I'd like.


- - Guinea pigs: I'm sure there are some bad surprises waiting for me, and I'd
    like to find them before October so that my course doesn't get torpedoed.


- - A C grammar. The supplied Pascal grammar was produced by retrieving the
    yacc Pascal description floating around on the net and hacking it into RDP
    source form (which shortens it a great deal BTW). An obvious first
    project is to do the same for the C grammar and maybe produce a
    pretty-printer based on it.


RDP source language.


- - The RDP source language is a very conventional EBNF supplemented with:


        a few pre-defined tokens such as ID (an alphanumeric identifier),
        NEW_ID (an alphanumeric identifier that has not yet been added to
        the symbol table), EOF and EOLN,


        some programmable tokens such as STRING and STRING_ESC which define
        Pascal and C style strings respectively,


        a set of directives which are mainly used to programme the scanner
        and the help subsystems


        a weird construct < sequence > 'token' which is shorthand for
        sequence {'token' sequence}


        attributes of the form :identifier which may be attached to production
        names or tokens where they define the type of variable returned on the
        LHS and the name of the receiving variable on the RHS of an ::=


        arbitrary C code semantic actions embedded in double quotes. These are
        simply copied into the generated parser.


- - Here's the RDP grammar written in terms of itself. Tokens are enclosed
    in single quotes, directives and built-in tokens are written in upper case
    and comments are denoted by (* ... *). RDP is case sensitive.


(* RDP source grammar *)
unit ::= { prod | dir} EOF. (* Zero or more productions or directives *)


prod ::= (ID|NEW_ID) [':'(ID|NEW_ID) {'*'} ] '::=' alt '.' .


alt ::= < seq > '|' . (* Alternates are separated by | *)


seq ::= { (item_ret [':' (ID | NEW_ID) ] | item_inl) }.


item_ret ::= ID | NEW_ID | (* Production *)
token | (* Token *)
(* Following four are parameterisble tokens for handling strings
      and comments *)
'STRING' '(' token ')' | (* Pascal style string *)
'STRING_ESC' '(' token token')' | (* C-style string *)
'COMMENT' '(' token token ')' | (* Non-nested comment *)
'COMMENT_NEST' '(' token token ')'. (* Nestable comment *)


item_inl ::= code | (* semantic action *)
'(' alt ')' | (* do first *)
'{' alt '}' | (* zero or more *)
'[' alt ']' | (* zero or one *)
'<' alt '>' token . (* list: <X>'t' expands to X {'t' X} *)


code ::= STRING_ESC('"' '\\'). (* Inline code is delimited by ".." *)
token ::= STRING_ESC('\'' '\\'). (* Tokens are delimited by '..' *)
comment ::= COMMENT_NEST('(*' '*)'). (* Nestable comments are delimited by (*..*) *)


dir ::= 'INCLUDE' '(' code ')' | (* Does for RDP what #include does for C *)
'HELP' '(' code code ')' | (* Add help line and command line switch *)
'USES' '(' code ')' | (* Adds a #include to the generated parser *)
'TITLE' '(' code ')' | (* Descriptive programme title for help *)
'SUFFIX' '(' code ')' | (* Defalt file suffix for scanner input *)
'PRE_PROCESS' '(' code ')' | (* Function to call before activating
parser *)
'POST_PROCESS' '(' code ')' | (* Function to call after activating
parser *)
'OUTPUT_FILE' '(' code ')' | (* Default output file name *)
'SET_SIZE' '(' NUMBER ')' | (* Set maximum set size
(must be >= # of tokens *)
'HASH_SIZE' '(' NUMBER ')' | (* Number of hash buckets in hash table *)
'HASH_PRIME' '(' NUMBER ')' | (* Hash key: see dragon book on hash
functions *)
'TAB_WIDTH' '(' NUMBER ')' | (* Tab expansion width in listings *)
'TEXT_SIZE' '(' NUMBER ')' | (* Default size of text buffer *)
'MAX_ERRORS' '(' NUMBER ')' | (* Stop after this many errors *)
'MAX_WARNINGS' '(' NUMBER ')' | (* Stop after this many warnings *)
'CASE_INSENSITIVE' | (* Make scanner case insensitive (cf Pascal) *)
'SHOW_SKIPS' | (* Issue 'Skipping...' informational messges *)
'NEWLINE_VISIBLE' | (* Make newlines visible
(cf Ada comments, assemblers) *)
'COMMENTS_VISIBLE'. (* Return comments to parser (cf assemblers) *)


(* End of grammar *)


How to get RDP.


- - Use anonymous ftp to cscx.dcs.rhbnc.ac.uk (134.219.200.45) and download
    pub/rdp/rdp.tar.Z if you are a Unix user or pub/rdp/rdp.zip if you are an
    MS-DOS user. Actually both files contain exactly the same distribution kit,
    so you will find an MS-DOS executable in the tar file.


- - If you only have email access to the Internet, try one of the email based
    ftp servers such as @decwrl.


- - If you can't manage ftp then I am prepared to send out 3.5in 1.4M disks
    for MS-DOS. If you're really desperate I can supply just any other medium
    that PC's, Amigas, Atari-ST's, Mac's or DEC computers have ever used, but
    I'm very short of time so strange requests may take a while to service.


Other machines.


- - An ex-student of mine is going to try and build RDP for an Amiga. IMHO
    anybody with an ANSI C compiler should have almost no problems building
    from the standard kit. On the other hand K&R compiler users may suffer.


Installation.


1 Unpack the distribution kit into a single directory. You should have the
    following files:


    buildlog.dos A log of Borland make and Borland C 3.1 building for DOS
    buildlog.unx A log of OSF/1 make and GNU CC for Alpha building for Unix
    crash.c Fatal error handling and memory allocation
    crash.h
    makefile Customisable make file for Unix and MS-DOS Borland C 3.1
    pascal.bnf Pascal grammar for building a Pascal parser
    rdp.bnf Decorated RDP grammar for building machine generated RDP
    rdp.c Hand written RDP front end
    rdp.h
    rdp.doc This file
    rdp_aux.c Auxilliary file containing RDP semantic actions
    rdp_aux.h
    scan.c A programmable scanner
    scan.h
    set.c An implmentation of Pascal style sets
    set.h
    symbol.c A hash code symbol table
    symbol.h
    test.pas A piece of Pascal for testing the Pascal parser
    test.toy A piece of Toy for testing the Toy parser
    toy.bnf Decorated Toy grammar for building Toy interpreter


2 The makefile can be used on Unix or on a PC, but you should edit it to
    make sure that it is configured for your operating system. Select one of
    the two blocks of macro definitons at the top of the file.


3 To build everything, just type 'make'. The default target in the makefile
    builds rdp, the Toy interpreter and the Pascal parser. The Toy interpreter
    is run on test.toy, and the Pascal interpreter is run on test.pas. Neither
    should generate any errors. On successful completion, make uses rdp to
    build rdp_gen, a machine generated version of rdp and then uses rdp_gen to
    reproduce itself. The two machine generated versions are compared with each
    other to make sure that the bootstrap has been successful. They should
    differ only in the program names.


    The following messages appearing during the build are normal:


    During compilation of toy.c
      Warning toy.c 732: 'base' is assigned a value that is never used in function main
      Warning toy.c 732: 'force' is assigned a value that is never used in function main


    During the run of toy on test.toy
      Toy V1.00 Feb 18 1994 20:07:04
      test.toy: toy interpreter OK
        Text mode: 0 errors, 0 warnings


    During generation of pascal.c the dangling else causes
      Error: ** Production 'statement__9' contains null but first /\ follow is not empty


    During compilation of pascal.c
      Warning pascal.c 1820: 'base' is assigned a value that is never used in function main
      Warning pascal.c 1820: 'force' is assigned a value that is never used in function main


    During the run of pascal on test.pas
      Pascal parser V1.00 (generated) Feb 18 1994 20:07:10
        test.pas: 0 errors, 0 warnings




    During the comparison of the machine generated rdp versions


      #include "rdp_genA.h"
      #include "rdp_genB.h"


      "Usage: rdp_genA [options] sourcefile \n\n"
      "Usage: rdp_genB [options] sourcefile \n\n"


    The buildlog.* files show a complete log for Unix and MS-DOS: if your
    results are different then please let me know.


4 If you type 'make clean' all the object files and the machine generated
    rdp versions will be deleted, leaving the distribution files plus the new
    executables.


5 If you type 'make veryclean' then the directory is cleaned and the
    executables are also deleted.


Missing bits to be fixed in version 1.1


There are a couple of things that I managed to break whilst preparing this
version. I've run out of time to work on RDP so I'll just note them here and
put out a maintenance release soon, maybe at the end of February 1994.


- - Presently the RDP text mode scanner is line oriented, so I have removed the
    interpretation of while loops from Toy because they break if the while
    statement is the first thing on the line. This is top priority to be fixed:
    look out for version 1.1


- - Late on, I introduced a stupid bug into the handling of comment and string
    delimiters so thatthey must be two characters long to work. Hence /* .. */
    is fine but { } is not. Hence the Pascal parser does not parse comments. This
    will also be fixed in version 1.1.


- - Due to a subtlety of the order in which sequences are evaluated, rdp_gen
    and its first child differ, so I need to generate rdp_gen's child and
    grandchild to do the convergence check in the makefile.


Enhancements for version 2


- - RDP parsers have large numbers of constant first and follow sets embedded
    in them. For some applications, especially assemblers, many of these sets
    are identical. In version 2 identical sets will be collapsed. This is only
    likely to save a few kilobytes in even the most pathological parser, but it
    offends my sense of neatness.


- - Macro handling code will be incorporated directly into the scanner. At the
    moment I do macros within my assemblers (which I'm not releasing for
    public consumption just yet).


Comments, queries and requests for code to


Dr Adrian Johnstone, CS Dept, Royal Holloway, University of London
Email: adrian@dcs.rhbnc.ac.uk
--


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.