parse tree representation for LPM

qjackson@mail.direct.ca
19 May 1996 17:39:55 -0400

          From comp.compilers

Related articles
parse tree representation for LPM qjackson@mail.direct.ca (1996-05-19)
| List of all articles for this month |
From: qjackson@mail.direct.ca
Newsgroups: comp.compilers
Date: 19 May 1996 17:39:55 -0400
Organization: Parsepolis Software ("ParseCity")
Keywords: parse, question

If anyone could provide any pointers on the following, I would greatly
appreciate it. I'll only provide as much introduction as necessary to deal
with the matter at hand.


I have implemented a pattern matching language class hierarchy in
C++. This engine can match pattern rules of arbitrary complexity
against data streams of arbitrary length.


One mechanism that LPM has is known as the "class clause", which
is written in the form:


        ['class_name'!


"Classes" may have three types of "members" -- literal (sometimes
called terminal), rule (ie. sub-clauses), or algorithmic (ie. C++
functions that, through their own internal logic, match 0 to n
characters from the stream).


A typical approach of the class mechanism might be:


LpmClass "ARTICLE" {


        ; terminal members


        "the",
        "a",
        "an"
        ; et cetera


}


LpmClass "ADVERB" {


        "very", "somewhat" ; et cetera


}


LpmClass "ADJECTIVE" {


        "big", "fat", "old" ; et cetera


}


LpmClass "ADJ_PHRASE {


        [@(
                ['ADVERB'!
                <ws>
        [)
        ['ADJECTIVE'!


}


LpmClass "NOUN" {


        "man", "cat" ; et cetera


}


LpmClass "NOUN_PHRASE" {


        [@( ; optional subsection
                ['ARTICLE'!
                <ws>
        [)
        [@(
                [@'ADJ_PHRASE'! ; optional class match
                <ws>
        [)
        ['NOUN'!
}


The rule <ws>['NOUN_PHRASE!<ws> would match the following:


          I saw the very old man in the store.
                    ^^^^^^^^^^^^^^^^^^


          That is a big cat on the ledge.
                        ^^^^^^^^^^^


That said, here's what I'd like to add --


I feel it would be useful if parse-tree information could be
[optionally] generated during a match. My query concerns what format
this symbolic information should be delivered to the developer. I
have several ideas, but I feel that I would rather hear what some of
the participants here have to say before deciding on a final model.


Some of my concerns:


        1. The parse-tree information should [optionally?] be returned
                in a format that is itself efficiently parsed by other tools,
                as well as [optionally?] be human-readable.


        2. The building of the information should take into account the
                many potential look-aheads and false leads that this type of
                brute force parse inevitably takes. Pruning those branches
                that lead nowhere should be optimally efficient.


I invite your comments on this.




Cheers,




Quinn Tyler Jackson


--
        Parsepolis Software || Quinn Tyler Jackson
                "ParseCity" || qjackson@direct.ca
+------http:/mypage.direct.ca/q/qjackson/-------->


--


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.