|Q1. reducing grammar redundancy email@example.com (valentin tihomirov) (2005-02-20)|
|Re: Q1. reducing grammar redundancy firstname.lastname@example.org (SM Ryan) (2005-02-28)|
|Re: Q1. reducing grammar redundancy email@example.com (Vidar Hokstad) (2005-02-28)|
|Re: Q1. reducing grammar redundancy firstname.lastname@example.org (2005-02-28)|
|Re: Q1. reducing grammar redundancy email@example.com (valentin tihomirov) (2005-02-28)|
|From:||"Vidar Hokstad" <firstname.lastname@example.org>|
|Date:||28 Feb 2005 00:47:42 -0500|
valentin tihomirov wrote:
> Call me an optimization paranoid but...I have a question regarding
> the issue of grammar generalization to be used in language tools.
[snipped lisp example]
>  The issue conserning SGML redundancies. Please consider a basic XML
> element -> OPEN_PAREN "element" CLOSE_PAREN // opening tag
> element // body
> OPEN_PAREN SLASH "element" CLOSE_PAREN; // closing tag
> As you see, the closing tag duplicates the appearance of "element"
> keyword. I may mistake easily here, looks like a context-sensetive
> rule is needed here in order to avoid the repetition. How can it be
> written in more compact way? How can it be written in a more optimal
> form acceptable by parser?
A simple, general approach to removing this kind of redundancy where
it makes sense is to push some checks out of the parser itself. For
XML for instance, it's highly unusual to write specific parsers for
specific XML vocabularies. Instead you'd normally use a generic parser
to handle the overal syntax of the tags and other markup, and enforce
the pairing and nesting of elements without expressing them directly
in the formal grammar.
This is what allows for the flexibility of defining languages for
dynamically specifying the grammar while still allowing a wide variety
of tools to parse vocabularies they don't understand the meaning of.
But the big question is why you want to write this in a more compact
way? If you're going to hardwire an XML parser, writing someting like:
element: "<element>" element_content "</element>"
seems to me at least to be far preferrable to making the grammar
unreadable just because you want to remove a few characters here and
Premature optimization and all that... Is there a specific problem that
make you want to do this?
Return to the
Search the comp.compilers archives again.