Related articles |
---|
parse tree representation for LPM qjackson@mail.direct.ca (1996-05-19) |
From: | qjackson@mail.direct.ca |
Newsgroups: | comp.compilers |
Date: | 19 May 1996 17:39:55 -0400 |
Organization: | Parsepolis Software ("ParseCity") |
Keywords: | parse, question |
If anyone could provide any pointers on the following, I would greatly
appreciate it. I'll only provide as much introduction as necessary to deal
with the matter at hand.
I have implemented a pattern matching language class hierarchy in
C++. This engine can match pattern rules of arbitrary complexity
against data streams of arbitrary length.
One mechanism that LPM has is known as the "class clause", which
is written in the form:
['class_name'!
"Classes" may have three types of "members" -- literal (sometimes
called terminal), rule (ie. sub-clauses), or algorithmic (ie. C++
functions that, through their own internal logic, match 0 to n
characters from the stream).
A typical approach of the class mechanism might be:
LpmClass "ARTICLE" {
; terminal members
"the",
"a",
"an"
; et cetera
}
LpmClass "ADVERB" {
"very", "somewhat" ; et cetera
}
LpmClass "ADJECTIVE" {
"big", "fat", "old" ; et cetera
}
LpmClass "ADJ_PHRASE {
[@(
['ADVERB'!
<ws>
[)
['ADJECTIVE'!
}
LpmClass "NOUN" {
"man", "cat" ; et cetera
}
LpmClass "NOUN_PHRASE" {
[@( ; optional subsection
['ARTICLE'!
<ws>
[)
[@(
[@'ADJ_PHRASE'! ; optional class match
<ws>
[)
['NOUN'!
}
The rule <ws>['NOUN_PHRASE!<ws> would match the following:
I saw the very old man in the store.
^^^^^^^^^^^^^^^^^^
That is a big cat on the ledge.
^^^^^^^^^^^
That said, here's what I'd like to add --
I feel it would be useful if parse-tree information could be
[optionally] generated during a match. My query concerns what format
this symbolic information should be delivered to the developer. I
have several ideas, but I feel that I would rather hear what some of
the participants here have to say before deciding on a final model.
Some of my concerns:
1. The parse-tree information should [optionally?] be returned
in a format that is itself efficiently parsed by other tools,
as well as [optionally?] be human-readable.
2. The building of the information should take into account the
many potential look-aheads and false leads that this type of
brute force parse inevitably takes. Pruning those branches
that lead nowhere should be optimally efficient.
I invite your comments on this.
Cheers,
Quinn Tyler Jackson
--
Parsepolis Software || Quinn Tyler Jackson
"ParseCity" || qjackson@direct.ca
+------http:/mypage.direct.ca/q/qjackson/-------->
--
Return to the
comp.compilers page.
Search the
comp.compilers archives again.