Pattern Matching with Syntax Analyzer (Reza Ferry)
5 Dec 2004 21:31:02 -0500

          From comp.compilers

Related articles
Pattern Matching with Syntax Analyzer (2004-12-05)
Re: Pattern Matching with Syntax Analyzer (Ira Baxter) (2004-12-13)
| List of all articles for this month |

From: (Reza Ferry)
Newsgroups: comp.compilers
Date: 5 Dec 2004 21:31:02 -0500
Keywords: parse
Posted-Date: 05 Dec 2004 21:31:02 EST

Right now I'm trying to find patterns in a html page. For
example I am looking for patterns in the form of:

If I see that pattern I will mark the first <td>. I am using a syntax
analyzer (javacup) to do this.

My problem is this. Right now I need to detect some pattern, and I
can't seem to create a good enough set of grammar rules.

I have a Document which basically can consist of several paragraph
start tags (Div, p, span), end tags, text, a tags, and separators
(respectively b, e, t, a, s)

a document is basically a combination of those tags (I don't care
about the order in the document)
D -> DC | empty
C -> b|e|s|t|a

That rule will enable me to accept any simplified html document
However because I'm trying to match a particular pattern I must also
detect the following rules
H1 -> s b^m t* e^n s
H2 -> s b^m t* s e
H3 -> b s t* e^n s
H4 -> b s t* s e

b^m means a sequence of m number of 'b'

Is it possible to detect those rules simultaneously (I can't afford to
do it several times)?
Can anyone help me in creating these rules?

Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.