Re: HTML grammar

Ralf Gerlich <Ralf.Gerlich@t-online.de>
20 Sep 1999 12:01:35 -0400

          From comp.compilers

Related articles
HTML grammar jantezan@comteco.entelnet.bo (Israel Antezana Rojas) (1999-09-16)
Re: HTML grammar aserafin@post.pl (Andrzej Serafin) (1999-09-20)
Re: HTML grammar Ralf.Gerlich@t-online.de (Ralf Gerlich) (1999-09-20)
| List of all articles for this month |
From: Ralf Gerlich <Ralf.Gerlich@t-online.de>
Newsgroups: comp.compilers
Date: 20 Sep 1999 12:01:35 -0400
Organization: T-Online
References: 99-09-059
Keywords: parse

Hi!


> I am trying to build an HTML parser, please if somebdy has already
> written an HTML grammar send it to me!.
You may probably find a definition of the HTML "grammar" at the W3C
page(www.w3c.org)


In fact HTML has a rather sloppy grammar. Parsing should normally be
done in two levels:


1. Generally decide which input is text and which is a tag. Parse the
tags by dividing their contents into words and arguments.


2. Now you need a system to check those "errorneous" constructs(which
are in fact supported by the grammar)
Therefore you need a definition for each type of block that contains
this data:
1. the name of the starting command
2. is it a block?(just think of IMG tags: they ain't got a "closer")
3. May either the starting or the ending tag or both be omitted?
(For an example of such a definition you should perhaps have a look at
how SGML or XML work)


According to this definition you can now generate a "parser" which
synchronizes itself by implicitly inserting missing start and end tags
where possible.


A good example of this _may_ be SGMLtools (http://www.sgmltools.org/).
They have C code which _may_ help you(I haven't had a look at it yet,
but they are in fact doing a "pretty print" of the SGML code according
to a definition, adding missing start and end tags where possible, thus
getting correct "code" to send to the real parser)


I hope this helps a bit(sorry I didn't go more into depth but I don't
have much time to answer and also this is only an idea of mine which is
not tested or implemented in any way yet)


Ciao,
Ralf


--
Ralf Gerlich Ralf.Gerlich@t-online.de
Passionate programmer http://www.d-design.net/rgerlich/


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.