Related articles |
---|
Pattern Matching with Syntax Analyzer rezaferry@gmail.com (2004-12-05) |
Re: Pattern Matching with Syntax Analyzer idbaxter@semdesigns.com (Ira Baxter) (2004-12-13) |
From: | rezaferry@gmail.com (Reza Ferry) |
Newsgroups: | comp.compilers |
Date: | 5 Dec 2004 21:31:02 -0500 |
Organization: | http://groups.google.com |
Keywords: | parse |
Posted-Date: | 05 Dec 2004 21:31:02 EST |
Right now I'm trying to find patterns in a html page. For
example I am looking for patterns in the form of:
<td><a>...</a></td><td><a>...</a></td><td><a>...</a></td><td><a>...</a></td>
If I see that pattern I will mark the first <td>. I am using a syntax
analyzer (javacup) to do this.
My problem is this. Right now I need to detect some pattern, and I
can't seem to create a good enough set of grammar rules.
I have a Document which basically can consist of several paragraph
start tags (Div, p, span), end tags, text, a tags, and separators
(respectively b, e, t, a, s)
a document is basically a combination of those tags (I don't care
about the order in the document)
D -> DC | empty
C -> b|e|s|t|a
That rule will enable me to accept any simplified html document
However because I'm trying to match a particular pattern I must also
detect the following rules
H1 -> s b^m t* e^n s
H2 -> s b^m t* s e
H3 -> b s t* e^n s
H4 -> b s t* s e
b^m means a sequence of m number of 'b'
Is it possible to detect those rules simultaneously (I can't afford to
do it several times)?
Can anyone help me in creating these rules?
Return to the
comp.compilers page.
Search the
comp.compilers archives again.