Pattern Matching with Syntax Analyzer

rezaferry@gmail.com (Reza Ferry)
5 Dec 2004 21:31:02 -0500

From comp.compilers

Related articles
*Pattern Matching with Syntax Analyzer rezaferry@gmail.com* (2004-12-05)**
Re: Pattern Matching with Syntax Analyzer idbaxter@semdesigns.com (Ira Baxter) (2004-12-13)

| List of all articles for this month |

From:	rezaferry@gmail.com (Reza Ferry)
Newsgroups:	comp.compilers
Date:	5 Dec 2004 21:31:02 -0500
Organization:	http://groups.google.com
Keywords:	parse
Posted-Date:	05 Dec 2004 21:31:02 EST

Right now I'm trying to find patterns in a html page. For
example I am looking for patterns in the form of:
<td><a>...</a></td><td><a>...</a></td><td><a>...</a></td><td><a>...</a></td>

If I see that pattern I will mark the first <td>. I am using a syntax
analyzer (javacup) to do this.

My problem is this. Right now I need to detect some pattern, and I
can't seem to create a good enough set of grammar rules.

I have a Document which basically can consist of several paragraph
start tags (Div, p, span), end tags, text, a tags, and separators
(respectively b, e, t, a, s)

a document is basically a combination of those tags (I don't care
about the order in the document)
D -> DC | empty
C -> b|e|s|t|a

That rule will enable me to accept any simplified html document
However because I'm trying to match a particular pattern I must also
detect the following rules
H1 -> s b^m t* e^n s
H2 -> s b^m t* s e
H3 -> b s t* e^n s
H4 -> b s t* s e

b^m means a sequence of m number of 'b'

Is it possible to detect those rules simultaneously (I can't afford to
do it several times)?
Can anyone help me in creating these rules?

Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.

Pattern Matching with Syntax Analyzer

rezaferry@gmail.com (Reza Ferry)5 Dec 2004 21:31:02 -0500

rezaferry@gmail.com (Reza Ferry)
5 Dec 2004 21:31:02 -0500