Re: parsing html?

ralph@inputplus.demon.co.uk (Ralph Corderoy)
27 Dec 2001 00:11:08 -0500

          From comp.compilers

Related articles
parsing html? iwaters@hg26.btclick.com (Ian) (2001-12-22)
Re: parsing html? rbw3@cet.nau.edu (Brock) (2001-12-24)
Re: parsing html? ralph@inputplus.demon.co.uk (2001-12-27)
Re: parsing html? rsherry8@home.com (Robert Sherry) (2001-12-27)
Re: parsing html? iwaters@hg26.btclick.com (Ian) (2001-12-29)
Re: parsing html? somik@yahoo.com (2002-01-24)
| List of all articles for this month |
From: ralph@inputplus.demon.co.uk (Ralph Corderoy)
Newsgroups: comp.compilers
Date: 27 Dec 2001 00:11:08 -0500
Organization: InputPlus Ltd.
References: 01-12-140
Keywords: parse
Posted-Date: 27 Dec 2001 00:11:08 EST

Hi Ian,


> [There is an official grammar for HTML, but it bears remarkably little
> relationship to the actual sloppy error-filled HTML that most web
> browsers manage to interpret. -John]


You could consider passing the HTML through Raggett's tidy first so you
have an easier job of parsing. Depends if that's allowed for your
assignment.


        http://www.w3.org/People/Raggett/tidy/




Ralph.


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.