Re: OO Parsing References?

jkuha@cc.helsinki.fi
Sun, 25 Aug 1991 15:23:43 GMT

          From comp.compilers

Related articles
OO Parsing References? act@ipl.rpi.edu (1991-08-22)
Re: OO Parsing References? grimlok@hubcap.clemson.edu (1991-08-24)
Re: OO Parsing References? jkuha@cc.helsinki.fi (1991-08-25)
| List of all articles for this month |

Newsgroups: comp.compilers
From: jkuha@cc.helsinki.fi
Keywords: OOP, parse, bibliography
Organization: University of Helsinki
References: 91-08-125
Date: Sun, 25 Aug 1991 15:23:43 GMT

In article 91-08-125, act@ipl.rpi.edu writes:


> Does anyone have any references for object oriented techniques for
> parsing? Any assistance would be greatly appreciated.


To my knowledge, there have been two basic approaches: presenting each
nonterminal as a class, or presenting each production rule as a class.


In [RuK88] a Scheme-interpreter (originally written in C) is rewritten in
C++, using C++'s virtual functions instead of typing with tag-fields.
Somewhat surprisingly the interpreter is faster than the C version!


The approach of presenting production rules as classes can be found in
[Hed89].


One variant of the recursive descent parsing technique can be found in
[Kos90], some discussion on modularization of language implementation is in
[Kos89].


More generally, a parsing technique suitable also for OOP can be found in
[SiS88]. On page 197 they give the parsing procedure for a nonterminal A in
the recursive descent implementation of the SLL(1) parser as follows (the
rules of A are A -> w_1 | w_2 | ... | w_n):


procedure A;
begin
        if token.kind in FIRST'_1( w_1 FOLLOW'_1(A)) then begin
                  write "A -> w_1";
                  parse(w_1)
        end else
        if token.kind in FIRST'_1( w_2 FOLLOW'_1(A)) then begin
                  write "A -> w_2";
                  parse(w_2)
        end else
        ....
        if token.kind in FIRST'_1( w_n FOLLOW'_1(A)) then begin
                  write "A -> w_n";
                  parse(w_n)
        end else
                error("No A can start with this.")
end;


(different textual substitutions for 'parse(w_i)' are given
depending on if w_i starts with a terminal or a nonterminal).


Based on this, you could write an object-oriented parser as follows:


1) Present each nonterminal as a class.


2) Make a class Base to define some functions for presenting the production
rules of the nonterminal, and for constructing the sets
FIRST'_1( w_i FOLLOW'_1(A)). Derive all nonterminal- classes from Base.


You could make the parsing 'Lazy', so that the sets would not be constructed
until parsing reaches the appropriate 'if token.kind in...' statement. Or you
could make the parsing 'Late', by reading the sets from a file; if the file
doesn't exist, only then all the sets were constructed (and saved). The file
would be deleted after each linking. This way, the sets would be constructed
only when running the executable first time after linking.


(Another way of doing this would be checking if the executable is younger
than the file containing the sets.)


The details could be hidden inside a 'Parsing_table' class (by overloading
the operator [] for it, it could always be used as a normal parsing table),
which should then be a 'friend' of nonterminal classes to access the rules.
You could have a 'library' of classes presenting nonterminals, and you could
freely add and delete classes from your project. When implementing a new
language, you could use your old classes: take the arithmetical expressions
from your Pascal-implementation, loop-statements from C and so on.


Even more reusability could be gained, if also terminals were presented as
classes; each terminal could define its syntax with a regular expression.
Lexical analyzer would collect these, and construct a deterministic lexer;
when identifying a token from input, it would return the whole string to
terminal. It would be terminals's job to construct an integer from a string
of five numbers, for example.


This way, the actual lexer-class would be the same in every language
implementation. Just like the 'Parsing_table' and 'Parser' -classes.


And with careful design, the object-oriented interpreter / compiler could be
even faster than one written by hand with C, as [RuK88] shows.


Or what do you think?


---


[RuK88] Russo V., Kaplan S.: "A C++ Interpreter for Scheme". Report no.
UIUCDCS-R-88-1461, University of Illinois at Urbana-Champaign, October 1988.


[Hed89] Hedin G.: "An Object-Oriented Notation for Attribute Grammars". In:
Proc. of the European Conf. on Object-Oriented Programming (ECOOP '89),
Nottingham, 1989. The British Computer Society Workshop Series, Cambridge
University Press, 1989, 329-345.


[Kos89] Koskimies K.: "Techniques for Modular Language Implementation".
Report A-1989-5, University of Tampere (Finland), December 1988.


[Kos90] Koskimies K.: "Lazy Recursive Descent Parsing for Modular Language
Implementation". Software Practice and Experience, VOL.20 (8), August 1990,
749-772.


[SiS88] Sippu S., Soisalon-Soininen E.: "Parsing Theory, Volume I, Languages
and Parsing", EATCS Monographs on Theoretical Computer Science,
Springer-Verlag Berlin Heidelberg, 1988.


--
Jorma Kuha,


Internet: JKUHA@cc.helsinki.fi
DECnet/Funet: HYLK::JKUHA
--


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.