Re: Yacc grammar for HTML/XML/WML

Pierre Mai <dent@dent.isdn.cs.tu-berlin.de>
20 Jul 1998 17:01:47 -0400

          From comp.compilers

Related articles
Yacc grammar for HTML/XML/WML terry.robinson@ibm.net (Terry Robinson) (1998-07-10)
Re: Yacc grammar for HTML/XML/WML qjackson@wave.home.com (Quinn Tyler Jackson) (1998-07-11)
Re: Yacc grammar for HTML/XML/WML mw@ipx2.rz.uni-mannheim.de (1998-07-13)
Re: Yacc grammar for HTML/XML/WML dent@dent.isdn.cs.tu-berlin.de (Pierre Mai) (1998-07-20)
| List of all articles for this month |
From: Pierre Mai <dent@dent.isdn.cs.tu-berlin.de>
Newsgroups: comp.compilers
Date: 20 Jul 1998 17:01:47 -0400
Organization: Technical University of Berlin, Germany
References: 98-07-112
Keywords: WWW, parse
X-PGP-Fingerprint: 17 2D 00 93 8B C8 57 57 A7 D7 CD E9 3A EA 6E 4C

mw@ipx2.rz.uni-mannheim.de (Marc Wachowitz) writes:


> Terry Robinson <terry.robinson@ibm.net> wrote:
> > Does anyone have a grammar for Yacc/Bison for a real mark-up language=


> > like HTML or WML (XML needs a document type definition to define a
> > language - well normally) or know where one can be gotten ?
> =


> Just in case "Yacc/Bison" is merely your assumption how a parser would
> be written, while the real problem is just to get some parser for these=


> languages: As long as the text follows a DTD, you could use nsgmls or
> directly the underlying C++ interface of SP, James Clark's SGML parser:=


> http://www.jclark.com/


One should also note, that at least for SGML, constructing a correct
parser is a rather non-trivial exercise, complicated by the fact, that
the syntax and semantics of full SGML are not a good match to most
"conventional" parsing strategies/tools used in the programming
language community (especially things like white-space handling should
pose a problem for yacc/bison).


Parsing XML is probably an order of magnitude simpler (which was one
of the design criteria for XML), but still is not a very good match
for yacc/bison&co.


Overall, you are much, much better of using one of the many available
XML parsers, like e.g. =C6lfred (in Java), or nsgmls (C++) which also
does full SGML, and HyTime, and ... as well.


<quote>
=C6lfred is free for both commercial and non-commercial use, and COMES
WITH NO WARRANTEE. You can download a copy of version 1.0 (with
source code) from the following URL:


    http://www.microstar.com/XML/index.htm
</quote>


(Beware, this quote is somewhat old, so maybe terms of use or
availability have changed...)


Regs, Pierre.


-- =
Pierre Mai <dent@cs.tu-berlin.de> http://home.pages.de/~trillian/
--


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.