Re: parsing html?

Brock <>
24 Dec 2001 00:08:08 -0500

          From comp.compilers

Related articles
parsing html? (Ian) (2001-12-22)
Re: parsing html? (Brock) (2001-12-24)
Re: parsing html? (2001-12-27)
Re: parsing html? (Robert Sherry) (2001-12-27)
Re: parsing html? (Ian) (2001-12-29)
Re: parsing html? (2002-01-24)
| List of all articles for this month |

From: Brock <>
Newsgroups: comp.compilers
Date: 24 Dec 2001 00:08:08 -0500
Organization: Compilers Central
References: 01-12-140
Keywords: parse
Posted-Date: 24 Dec 2001 00:08:08 EST

|[There is an official grammar for HTML, but it bears remarkably little
|relationship to the actual sloppy error-filled HTML that most web
|browsers manage to interpret. -John]

I recently decided to parse some html in a small project, see and have a question.

Instead of parsing full html I just wanted to parse balanced-tags,
with explicit exeptions (whose end-tags if present would be
ignored). After playing with the grammar for a while for some reason I
decided to just parse out a stream of tags and text in a yacc-like-way
and then use a function to break the stream up into trees.

Point being I don't like it this way and think it should all be in the
yacc-step. If any of you get a chance could you look over at my
grammar (contained in parser.mly) and possibly at the functions (in and give me some ideas of where I went wrong (or why the way
I did it is good)? Or perhaps I should extract the core grammar and
post that... maybe I will do that in a few days.

Anyway, the balanced-tag grammar would work great for the above
mentioned html parser (with awareness of comments and normal text and
one or two other things).


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.