Recursive Descent Pascal Compiler - Data driven implementation ??

Andrew Lord <Andrew_Lord@tertio.co.uk>
16 Feb 1997 22:51:14 -0500

          From comp.compilers

Related articles
Recursive Descent Pascal Compiler - Data driven implementation ?? Andrew_Lord@tertio.co.uk (Andrew Lord) (1997-02-16)
| List of all articles for this month |

From: Andrew Lord <Andrew_Lord@tertio.co.uk>
Newsgroups: comp.compilers
Date: 16 Feb 1997 22:51:14 -0500
Organization: Compilers Central
Keywords: Pascal, question, LL(1)

I plan to re-write a recursive descent Pascal compiler in C, that I
initially wrote many moons ago (1985?). That was a subset but I plan
to go the whole way this time!


I want to get my hands dirty (to a certain extent) so I dont want to use
any compiler writing tools, the parser bit is easy and looks real pretty
until I start to deal with
error recovery
long syntax/semantics
and
code generation.


My original version had a procedure for each production (approx) . Eg
parseProgram, parseBlock, parseStatement etc. and the logic for each
production was reflected by the 'flow' of the C program.
eg
parseIf()
{
accept(IF);
parseBoolExp();
accept(THEN);
parseStatement();
}


I then used starter and follow sets for error recovery.


Now that I'm re-writing the compiler I want to make the code 'simpler'
by effectively just specifying the production rules and letting the
compiler work out the flow of control and the starter and follow sets
for itself.


This is a bit in-efficient I know. I'll deal with that once I get
something working. Perhaps all static data will be loaded from a file
rather than re-calculated. Thats I side issue for me at the moment.


Eg. /* psuedo code */


Production if_prod = { "IF" , bool_exp_prod , "THEN" , statement_prod};


Production block_prod =
      { "BEGIN" , statement_prod , { , ",", parseStat , } , "END" };


/* This function just syncrhonises the parser with the input stream
before calling the real parse function */
parseProduction(Production *production,Set followSet, Set beacons)
{
InitStarterSet(production);
if (! SYM in production->starterSet)
{
outputError;
skipto(production->starterSet u followSet u beacons);
}
if (SYM in production->starterSet)
{
reallyParseProduction(production,followSet,beacons);
if (! SYM in followSet)
{
output Error;
skipto(followSet u beacons);
}
}
}


/*
reallyParseProduction(Production *production,Set followSet,Set beacons)
{
Struct privateData;


/*---------------------------------------------------------
fancy code to step through a production definition and call
the "methods" using "privateData" as their shared area.
---------------------------------------------------------*/
}


The parseProduction funtion will work out starter sets *once* if it
encounters a production it hasn't yet parsed and store the set in the
Production structure. It will also deal with error recovery and the
like.


This is the horrible(ish) bit.


Each item in the production definition will also have a method (function
pointer) associated with it that will check the semantics and generate
code for that point in the definition. Eg.


Production if_prod = {
{"IF" , NULL },
{bool_exp_prod , NULL },
{"THEN" , createJumpIfFalseLabel },
{statement_prod , outputLabel };


  All of the methods in one production will share the private data area.


My question is , is this a good way to do things?
I have my mis-givings . It should
1)make for a more extensible compiler - due to data driven model
2)give more consistence regarding error recovery - all error recovery
  is in one place - the parseProduction() function.


Obviously it will be slower that a pure "code driven" compiler but *may*
be more compact.


All the little "method" functions may end up looking really horrible.
All of the extra function calls may incur a serious overhead.


Is it time to read up on all that "table driven" compiler stuff and do
things properly?


Andy L.
Andrew_Lord@tertio.co.uk
--


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.