Re: Writing an interpreter

qjackson@direct.ca
13 Jun 1996 18:09:09 -0400

          From comp.compilers

Related articles
Writing an interpreter Arne.Evertsson@Octalogic.se (1996-06-01)
Re: Writing an interpreter oplec@westminster.ac.uk (Psi) (1996-06-08)
Re: Writing an interpreter emcee@efi.com (1996-06-08)
Re: Writing an interpreter adf@idt.unit.no (1996-06-13)
Re: Writing an interpreter qjackson@direct.ca (1996-06-13)
| List of all articles for this month |
From: qjackson@direct.ca
Newsgroups: comp.compilers
Date: 13 Jun 1996 18:09:09 -0400
Organization: Parsepolis Software ("ParseCity")
Keywords: interpreter, design

Lambert Lum wrote:


> In my approach to an interpreter, I took a really unorthodox
> approach. [...] I'm embedding an interpreter into flex.


[...]


> I just set the regular expressions to pick up what I want, and then
> for every regular expression, I got a pre-defined behavior in
> response. Simple, heh?


> As of now, I'm still in the beginning stages, but I'm wondering if
> the great gurus of compilers would be kind enough to offer opinions
> on my unorthodox approach of using flex. Will I fail? Will I
> succeed? Is flex a standard tool in interpreter development, but
> I've been too dense to figure it out?


About which John Levine said:


> [That approach can certainly be made to work, but after a while
> you'll probably find that you'll be happier translating to a
> tokenized internal form, either trees and RPN, and interpreting
> that. It's considerably faster when you don't have to rescan the
> source each time through a loop. -John]


I agree with John on this from recent experience, and will add
several of my own comments based upon recent experience with a
pattern matcher and a math expression parser team of classes of my
design.


With CLpm, the pattern matching class, pattern rules are tokenized at
instantation. One pattern taken from some freshly written code
reads:


    CLpm graphRule
    (
        "['GRAPH'$" // match the literal "GRAPH"
        "<ws>" // match any amount of whitespace
        "[^" // anchor
            "[+'A-Z'#" // RE: [A-Z]+
            "['(X)'$" // match the literal "(X)"
        "['fun'^" // cast from anchor to here into "fun"
        "<ws>"
        "['WITH'$" // literal
        "<ws>"
        "[^"
            "[+' '{NOT}#" // RE: [^ ]+
        "['min'^"
        "<ws>"
        "['TO'$"
        "<ws>"
        "[^"
            "[+' '{NOT}#"
        "['max'^",


        graphLink // a pointer to a function declared elsewhere


    );


(Yeah, yeah, it looks like Godzilla's breakfast, but stringing it out
vertically like that makes it easier to modify later and it all looks
the same to the C++ scanner anyway.)


Okay, suppose the user inputs the commands at the LPM demo prompt:


      > f(x) := cos(x)^2*x


      > graph f(x) with -10 to 10


The second command input would trigger the above rule. The code link
given in the second parm at instantiation would be called
asynchronously whenever this particular rule was matched against a
stream. It is assumed that graphLink(),which accepts several
parameters, one of them a pointer to the rule that triggered it,
contains the appropriate code to deal with the actual values supplied
in the paramters, and would graph the function accordingly.


This is similar to Lambert Lum's approach, but certainly involves
pre-tokenizing rules into tables of enumerated types, et cetera, for
speed.


The expression f(x):=cos(x)^2*x is also tokenized only once, since it
is the target of a loop that iterates 100 times to draw a decent
graph. Retokenizing at each graph loop was about 4 times less
efficient in terms of speed without compiler optimization turned on.
Nesting of functions, and the length of identifier names used greatly
affected speed before I pre-tokenized expressions. (Of course!)


I have found C++ to be the language of choice in implementing this
type "run-time compilation" system, since instantiation and
.operator=() allow for two logical points of tokenization. There is
also room for self-modifying expressions and rules in CLpm/CArdaf,
which I don't think a lex/flex based system would allow for, since
lex produces static tables, rather than producing an RE interpreter
(at least the version of lex I've reviewed all produce C code and
tables, rather than an RE interpreter.)


Henry Spencer's regexp.c might be integrated into the lex generated
systems to allow for run-time RE parsing, I suppose.


Cheers,




Quinn




--
        Parsepolis Software || Quinn Tyler Jackson
                "ParseCity" || qjackson@direct.ca
+------http:/mypage.direct.ca/q/qjackson/-------->
--


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.