Related articles |
---|
Regular Expressions m_j_mather@yahoo.com.au (2004-10-09) |
Re: Regular Expressions newsserver_mails@bodden.de (Eric Bodden) (2004-10-12) |
Re: Regular Expressions randyhyde@earthlink.net (Randall Hyde) (2004-10-12) |
Re: Regular Expressions schmitz@i3s.unice.fr (Sylvain Schmitz) (2004-10-12) |
Re: Regular Expressions Martin.Ward@durham.ac.uk (Martin Ward) (2004-10-12) |
Re: Regular Expressions torbenm@diku.dk (2004-10-12) |
Re: Regular Expressions dmaze@mit.edu (David Z Maze) (2004-10-12) |
Re: Regular Expressions Martin.Ward@durham.ac.uk (Martin Ward) (2004-10-17) |
Re: Regular Expressions choksheak@yahoo.com (ChokSheak Lau) (2004-10-21) |
Re: regular expressions wendt@CS.ColoState.EDU (1993-03-22) |
[7 later articles] |
From: | Sylvain Schmitz <schmitz@i3s.unice.fr> |
Newsgroups: | comp.compilers |
Date: | 12 Oct 2004 00:53:35 -0400 |
Organization: | Compilers Central |
References: | 04-10-069 |
Keywords: | lex |
Posted-Date: | 12 Oct 2004 00:53:35 EDT |
Hello,
Mark wrote:
> I just can't seem to figure out how to invent a regular expression
> that will strip all HTML tags (except TABLE tags) out of a string and
> leave the rest of the text. When a TABLE tag is encountered i need to
> strip everything under it.
The problem you will run into is when several tables are nested. You
have to match an opening <table> tag with the corresponding ending
</table> tag. This language is not regular but context-free.
A simple counter will do the trick. Here is a simple exemple for lex:
%{
int table_tag_count;
int missing_btag;
%}
%option noyywrap
%x TABLE
btable "<table>"
etable "</table>"
tag \<[^>]*\>
%%
{btable} table_tag_count++; BEGIN(TABLE);
{etable} missing_btag++;
<TABLE>{btable} table_tag_count++;
<TABLE>{etable} if (0 == --table_tag_count) BEGIN(0);
<TABLE>[\n]|. /* ignore */;
{tag} /* ignore */;
%%
int
main (void)
{
int ret;
table_tag_count = 0;
missing_btag = 0;
ret = yylex();
if (missing_btag)
fprintf(stderr, "%d missing opening <table>.\n",
missing_btag);
if (table_tag_count)
fprintf(stderr, "%d missing ending </table>.\n",
table_tag_count);
return missing_btag + table_tag_count + ret;
}
--
Hope that helps,
Sylvain
Return to the
comp.compilers page.
Search the
comp.compilers archives again.