Re: Regular Expressions

David Z Maze <>
12 Oct 2004 00:56:48 -0400

          From comp.compilers

Related articles
Regular Expressions (2004-10-09)
Re: Regular Expressions (Eric Bodden) (2004-10-12)
Re: Regular Expressions (Randall Hyde) (2004-10-12)
Re: Regular Expressions (Sylvain Schmitz) (2004-10-12)
Re: Regular Expressions (Martin Ward) (2004-10-12)
Re: Regular Expressions (2004-10-12)
Re: Regular Expressions (David Z Maze) (2004-10-12)
Re: Regular Expressions (Martin Ward) (2004-10-17)
Re: Regular Expressions (ChokSheak Lau) (2004-10-21)
Re: regular expressions wendt@CS.ColoState.EDU (1993-03-22)
Regular Expressions (trejo ortiz alejandro augusto) (1995-10-16)
Re: Regular Expressions (Mitchell Perilstein) (1995-10-23)
Re: Regular Expressions (1995-10-29)
[4 later articles]
| List of all articles for this month |

From: David Z Maze <>
Newsgroups: comp.compilers
Date: 12 Oct 2004 00:56:48 -0400
Organization: Compilers Central
References: 04-10-069
Keywords: lex
Posted-Date: 12 Oct 2004 00:56:48 EDT (Mark) writes:

> I just can't seem to figure out how to invent a regular expression
> that will strip all HTML tags (except TABLE tags) out of a string
> and leave the rest of the text. When a TABLE tag is encountered i
> need to strip everything under it.

If your HTML happens to be well-formed XML, then you could do this
very easily with an XSLT [1] stylesheet:

    <xsl:stylesheet xmlns:xsl=""
        <xsl:output method="text"/>
        <xsl:template select="table"/>

The default behavior of XSLT is pretty much just to strip tags; here,
you're adding a template that says "when you find a table, do
nothing", including not recursing into children to print text.

> But how do I make it also strip entire TABLE elements?

Assuming you don't have nested tables, you could replace


with the empty string (using Perl regexp syntax, so that .*? is a
"non-greedy" match-everything). If you could have tables within your
table cells, the problem is equivalent to the paren-matching problem
and a regexp isn't powerful enough.



Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.