Re: Q1. reducing grammar redundancy

"Vidar Hokstad" <vidar@hokstad.name>
28 Feb 2005 00:47:42 -0500

          From comp.compilers

Related articles
Q1. reducing grammar redundancy spam@abelectron.com (valentin tihomirov) (2005-02-20)
Re: Q1. reducing grammar redundancy wyrmwif@tsoft.org (SM Ryan) (2005-02-28)
Re: Q1. reducing grammar redundancy vidar@hokstad.name (Vidar Hokstad) (2005-02-28)
Re: Q1. reducing grammar redundancy branco.medeiros@gmail.com (2005-02-28)
Re: Q1. reducing grammar redundancy spam@abelectron.com (valentin tihomirov) (2005-02-28)
| List of all articles for this month |

From: "Vidar Hokstad" <vidar@hokstad.name>
Newsgroups: comp.compilers
Date: 28 Feb 2005 00:47:42 -0500
Organization: http://groups.google.com
References: 05-02-086
Keywords: parse
Posted-Date: 28 Feb 2005 00:47:42 EST

valentin tihomirov wrote:


> Call me an optimization paranoid but...I have a question regarding
> the issue of grammar generalization to be used in language tools.


[snipped lisp example]
> [2] The issue conserning SGML redundancies. Please consider a basic XML
> element:
> element -> OPEN_PAREN "element" CLOSE_PAREN // opening tag
> element // body
> OPEN_PAREN SLASH "element" CLOSE_PAREN; // closing tag
> As you see, the closing tag duplicates the appearance of "element"
> keyword. I may mistake easily here, looks like a context-sensetive
> rule is needed here in order to avoid the repetition. How can it be
> written in more compact way? How can it be written in a more optimal
> form acceptable by parser?


A simple, general approach to removing this kind of redundancy where
it makes sense is to push some checks out of the parser itself. For
XML for instance, it's highly unusual to write specific parsers for
specific XML vocabularies. Instead you'd normally use a generic parser
to handle the overal syntax of the tags and other markup, and enforce
the pairing and nesting of elements without expressing them directly
in the formal grammar.


This is what allows for the flexibility of defining languages for
dynamically specifying the grammar while still allowing a wide variety
of tools to parse vocabularies they don't understand the meaning of.


But the big question is why you want to write this in a more compact
way? If you're going to hardwire an XML parser, writing someting like:


element: "<element>" element_content "</element>"


seems to me at least to be far preferrable to making the grammar
unreadable just because you want to remove a few characters here and
there.


Premature optimization and all that... Is there a specific problem that
make you want to do this?


Vidar


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.