Re: Pattern Matching with Syntax Analyzer

"Ira Baxter" <idbaxter@semdesigns.com>
13 Dec 2004 02:05:03 -0500

          From comp.compilers

Related articles
Pattern Matching with Syntax Analyzer rezaferry@gmail.com (2004-12-05)
Re: Pattern Matching with Syntax Analyzer idbaxter@semdesigns.com (Ira Baxter) (2004-12-13)
| List of all articles for this month |
From: "Ira Baxter" <idbaxter@semdesigns.com>
Newsgroups: comp.compilers
Date: 13 Dec 2004 02:05:03 -0500
Organization: http://extra.newsguy.com
References: 04-12-026
Keywords: parse
Posted-Date: 13 Dec 2004 02:05:03 EST

"Reza Ferry" <rezaferry@gmail.com> wrote
> Right now I'm trying to find patterns in a html page. For
> example I am looking for patterns in the form of:
>
<td><a>...</a></td><td><a>...</a></td><td><a>...</a></td><td><a>...</a></td>
>
> [snip] I am using a syntax analyzer (javacup) to do this.
>
>[snip]
> I have a Document which basically can consist of several paragraph
> start tags (Div, p, span), end tags, text, a tags, and separators
> (respectively b, e, t, a, s)
>
> a document is basically a combination of those tags (I don't care
> about the order in the document)
> D -> DC | empty
> C -> b|e|s|t|a
>
> That rule will enable me to accept any simplified html document
> However because I'm trying to match a particular pattern I must also
> detect the following rules
> H1 -> s b^m t* e^n s
> H2 -> s b^m t* s e
> H3 -> b s t* e^n s
> H4 -> b s t* s e
>
> b^m means a sequence of m number of 'b'


Do you insist on a specific numbers n and m?
Rather it appears that you are interested in n and m=1 and >1.
Would the rules


H1 -> s b+ t* e e+ s
H2 -> s b+ t* s e
H3 -> b s t* e e+ s
H4 -> b s t* s e


do the job? If so, then a "deterministic" parsing technology
(accepts first match, such as I beleive that JCup is,
better facts welcome) will likely do the trick.


If these rules overlap (can produce ambiguous parses), then you need a
parser that can provide *all* the possible parses ("matches"). GLR is
a good technology for that.


Our DMS Software Reengineering toolkit contains a GLR parser, and
could easily do this example.
--
Ira D. Baxter, Ph.D., CTO 512-250-1018
Semantic Designs, Inc. www.semdesigns.com


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.