Adaptive Grammars and XML - Paper available for review

Quinn Tyler Jackson <qjackson@shaw.ca>
23 May 2002 01:27:17 -0400

          From comp.compilers

Related articles
Adaptive Grammars and XML - Paper available for review qjackson@shaw.ca (Quinn Tyler Jackson) (2002-05-23)
| List of all articles for this month |

From: Quinn Tyler Jackson <qjackson@shaw.ca>
Newsgroups: comp.compilers
Date: 23 May 2002 01:27:17 -0400
Organization: Compilers Central
Keywords: parse, journal, available
Posted-Date: 23 May 2002 01:27:17 EDT

The following paper is now available for pre-preprint review in PDF format.
Please note institutional affiliation when you request.


Requests to: qjackson@shaw.ca


Noesis-E, June 2002, Vol. 2, No. 2


Efficient Formalism-Only Parsing of XML/HTML Using the §-Calculus


Quinn Tyler Jackson
Jackson Solutions, Port Coquitlam, British Columbia
qjackson@shaw.ca


Keywords: XML, parsing, adaptive grammars, §-Calculus, Meta-S Grammar
Development System


Abstract:


Traditionally, correct parsing of XML and HTML has been littered with
semantic hacks in the parsing code to deal with the oddities of these
languages, since HTML accepts unbalanced tags and tags that do not
match in case, but XML is less forgiving. The detection of
well-formedness of XML documents has, to date, required semantic
analysis outside of the grammar specification. We present a
grammar-only (HT|X)ML parser which, upon detecting that it is parsing
XML, modifies itself dynamically in order to insure that the document
conforms to XML ’s stricter rules. Our grammar detects unbalanced
tags in XML, as well as mismatched case in otherwise balanced tags,
while, at the same time, requiring XML document tag’s attribute
values to be in quotes, but accepting the looser attribute syntax when
in an HTML document. On a 733 MHz Windows 2000 machine, our parser did
a well-formedness detecting parse on XML documents such as the KJV Old
Testament at a rate of 84 Kb/second, Austin’s Pride and Prejudice
at a rate of 98 Kb/second, and Wolfgang May’s Mondial 3.0 database
at a rate of 109 Kb/second.


--
Quinn Tyler Jackson
http://QuinnTylerJackson.n3.net/


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.