Re: Regular Expressions

Sylvain Schmitz <>
12 Oct 2004 00:53:35 -0400

          From comp.compilers

Related articles
Regular Expressions (2004-10-09)
Re: Regular Expressions (Eric Bodden) (2004-10-12)
Re: Regular Expressions (Randall Hyde) (2004-10-12)
Re: Regular Expressions (Sylvain Schmitz) (2004-10-12)
Re: Regular Expressions (Martin Ward) (2004-10-12)
Re: Regular Expressions (2004-10-12)
Re: Regular Expressions (David Z Maze) (2004-10-12)
Re: Regular Expressions (Martin Ward) (2004-10-17)
Re: Regular Expressions (ChokSheak Lau) (2004-10-21)
Re: regular expressions wendt@CS.ColoState.EDU (1993-03-22)
[7 later articles]
| List of all articles for this month |

From: Sylvain Schmitz <>
Newsgroups: comp.compilers
Date: 12 Oct 2004 00:53:35 -0400
Organization: Compilers Central
References: 04-10-069
Keywords: lex
Posted-Date: 12 Oct 2004 00:53:35 EDT


Mark wrote:
> I just can't seem to figure out how to invent a regular expression
> that will strip all HTML tags (except TABLE tags) out of a string and
> leave the rest of the text. When a TABLE tag is encountered i need to
> strip everything under it.
The problem you will run into is when several tables are nested. You
have to match an opening <table> tag with the corresponding ending
</table> tag. This language is not regular but context-free.

A simple counter will do the trick. Here is a simple exemple for lex:

      int table_tag_count;
      int missing_btag;
%option noyywrap

btable "<table>"
etable "</table>"
tag \<[^>]*\>

{btable} table_tag_count++; BEGIN(TABLE);
{etable} missing_btag++;
<TABLE>{btable} table_tag_count++;
<TABLE>{etable} if (0 == --table_tag_count) BEGIN(0);
<TABLE>[\n]|. /* ignore */;
{tag} /* ignore */;


main (void)
      int ret;
      table_tag_count = 0;
      missing_btag = 0;

      ret = yylex();

      if (missing_btag)
          fprintf(stderr, "%d missing opening <table>.\n",
      if (table_tag_count)
          fprintf(stderr, "%d missing ending </table>.\n",

      return missing_btag + table_tag_count + ret;

Hope that helps,


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.