Re: Regular Expressions

Martin Ward <Martin.Ward@durham.ac.uk>
12 Oct 2004 00:53:42 -0400

From comp.compilers

Related articles
Regular Expressions m_j_mather@yahoo.com.au (2004-10-09)
Re: Regular Expressions newsserver_mails@bodden.de (Eric Bodden) (2004-10-12)
Re: Regular Expressions randyhyde@earthlink.net (Randall Hyde) (2004-10-12)
Re: Regular Expressions schmitz@i3s.unice.fr (Sylvain Schmitz) (2004-10-12)
*Re: Regular Expressions Martin.Ward@durham.ac.uk (Martin Ward)* (2004-10-12)**
Re: Regular Expressions torbenm@diku.dk (2004-10-12)
Re: Regular Expressions dmaze@mit.edu (David Z Maze) (2004-10-12)
Announcing The Grammar Forge Visual Grammar Development Environment quinn-j@shaw.ca (Quinn Tyler Jackson) (2004-10-17)
Re: Regular Expressions Martin.Ward@durham.ac.uk (Martin Ward) (2004-10-17)
Re: Regular Expressions choksheak@yahoo.com (ChokSheak Lau) (2004-10-21)
Re: Announcing The Grammar Forge Visual Grammar Development Environmen cfc@shell01.TheWorld.com (Chris F Clark) (2004-10-21)
[8 later articles]

| List of all articles for this month |

From:	Martin Ward <Martin.Ward@durham.ac.uk>
Newsgroups:	comp.compilers
Date:	12 Oct 2004 00:53:42 -0400
Organization:	Compilers Central
References:	04-10-069
Keywords:	lex
Posted-Date:	12 Oct 2004 00:53:42 EDT

On Sunday 10 Oct 2004 3:34 am, Mark wrote:
> I just can't seem to figure out how to invent a regular expression
> that will strip all HTML tags (except TABLE tags) out of a string and
> leave the rest of the text. When a TABLE tag is encountered i need to
> strip everything under it.

The perl module HTML::Parser
http://www.perldoc.com/perl5.6.1/lib/HTML/Parser.html
is probably a good starting point.

--
Martin

Martin.Ward@durham.ac.uk http://www.cse.dmu.ac.uk/~mward/ Erdos number: 4
G.K.Chesterton web site: http://www.cse.dmu.ac.uk/~mward/gkc/

Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.

Re: Regular Expressions

Martin Ward <Martin.Ward@durham.ac.uk>12 Oct 2004 00:53:42 -0400

Martin Ward <Martin.Ward@durham.ac.uk>
12 Oct 2004 00:53:42 -0400