Re: Parsing HTML : I would appreciate advice

Daniel Zingaro <zingard@mcmaster.ca>
15 Nov 2006 00:10:08 -0500

          From comp.compilers

Related articles
Parsing HTML : I would appreciate advice jim@aol.com (Jim) (2006-11-13)
Re: Parsing HTML : I would appreciate advice zingard@mcmaster.ca (Daniel Zingaro) (2006-11-15)
Re: Parsing HTML : I would appreciate advice JustinBl@osiristrading.com (excalibur2000) (2006-11-15)
Re: Parsing HTML : I would appreciate advice vidar.hokstad@gmail.com (Vidar Hokstad) (2006-11-15)
Re: Parsing HTML : I would appreciate advice Juergen.KahrsDELETETHIS@vr-web.de (Juergen Kahrs) (2006-11-15)
Re: Parsing HTML : I would appreciate advice JoachimPimiskern@web.de (Joachim Pimiskern) (2006-11-15)
Re: Parsing HTML : I would appreciate advice m.collado@fi.upm.es (Manuel Collado) (2006-11-15)
Re: Parsing HTML : I would appreciate advice ojh16@student.canterbury.ac.nz (Oliver Hunt) (2006-11-15)
[1 later articles]
| List of all articles for this month |
From: Daniel Zingaro <zingard@mcmaster.ca>
Newsgroups: comp.compilers
Date: 15 Nov 2006 00:10:08 -0500
Organization: Compilers Central
References: 06-11-059
Keywords: parse
Posted-Date: 15 Nov 2006 00:10:08 EST

Hi,


A pedegogical XML parser I wrote in Pascal can be found at
http://www.cas.mcmaster.ca/~zingard/xmlparser.zip


HTML can be parsed similarly. ... Of course this is only if you feel
like essentially wasting time solving a problem that has been solved
over and over before, like John noted =).


Thanks,
Dan


At 04:31 PM 11/13/2006, you wrote:
>The problem to solve.
>
>I have to parse millions of html documents, and return just the
>plaintext/bytes. Many of the html documents contain Japanese
>characters and so it will be necessary to read the codepage in the
>html header, so the bytes can be read properly. ...


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.