Related articles |
---|
parsing html? iwaters@hg26.btclick.com (Ian) (2001-12-22) |
Re: parsing html? rbw3@cet.nau.edu (Brock) (2001-12-24) |
Re: parsing html? ralph@inputplus.demon.co.uk (2001-12-27) |
Re: parsing html? rsherry8@home.com (Robert Sherry) (2001-12-27) |
Re: parsing html? iwaters@hg26.btclick.com (Ian) (2001-12-29) |
Re: parsing html? somik@yahoo.com (2002-01-24) |
From: | ralph@inputplus.demon.co.uk (Ralph Corderoy) |
Newsgroups: | comp.compilers |
Date: | 27 Dec 2001 00:11:08 -0500 |
Organization: | InputPlus Ltd. |
References: | 01-12-140 |
Keywords: | parse |
Posted-Date: | 27 Dec 2001 00:11:08 EST |
Hi Ian,
> [There is an official grammar for HTML, but it bears remarkably little
> relationship to the actual sloppy error-filled HTML that most web
> browsers manage to interpret. -John]
You could consider passing the HTML through Raggett's tidy first so you
have an easier job of parsing. Depends if that's allowed for your
assignment.
http://www.w3.org/People/Raggett/tidy/
Ralph.
Return to the
comp.compilers page.
Search the
comp.compilers archives again.