Pattern Matching with Syntax Analyzer

rezaferry@gmail.com (Reza Ferry)
5 Dec 2004 21:31:02 -0500

          From comp.compilers

Related articles
Pattern Matching with Syntax Analyzer rezaferry@gmail.com (2004-12-05)
Re: Pattern Matching with Syntax Analyzer idbaxter@semdesigns.com (Ira Baxter) (2004-12-13)
| List of all articles for this month |

From: rezaferry@gmail.com (Reza Ferry)
Newsgroups: comp.compilers
Date: 5 Dec 2004 21:31:02 -0500
Organization: http://groups.google.com
Keywords: parse
Posted-Date: 05 Dec 2004 21:31:02 EST

Right now I'm trying to find patterns in a html page. For
example I am looking for patterns in the form of:
<td><a>...</a></td><td><a>...</a></td><td><a>...</a></td><td><a>...</a></td>


If I see that pattern I will mark the first <td>. I am using a syntax
analyzer (javacup) to do this.


My problem is this. Right now I need to detect some pattern, and I
can't seem to create a good enough set of grammar rules.


I have a Document which basically can consist of several paragraph
start tags (Div, p, span), end tags, text, a tags, and separators
(respectively b, e, t, a, s)


a document is basically a combination of those tags (I don't care
about the order in the document)
D -> DC | empty
C -> b|e|s|t|a


That rule will enable me to accept any simplified html document
However because I'm trying to match a particular pattern I must also
detect the following rules
H1 -> s b^m t* e^n s
H2 -> s b^m t* s e
H3 -> b s t* e^n s
H4 -> b s t* s e


b^m means a sequence of m number of 'b'


Is it possible to detect those rules simultaneously (I can't afford to
do it several times)?
Can anyone help me in creating these rules?


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.