Related articles |
---|
SGML to YACC j_mcarthur@BIX.com (1994-05-26) |
Re: SGML to YACC vosse@ruls41.LeidenUniv.nl (1994-05-30) |
Re: SGML to YACC glyph@netcom.com (1994-05-31) |
Anouncement: ETO, an object oriented universal syntax checker china@bernina.ethz.ch (1994-06-03) |
Newsgroups: | comp.compilers |
From: | j_mcarthur@BIX.com (Jeffrey McArthur) |
Keywords: | parse, question |
Organization: | Galahad |
Date: | Thu, 26 May 1994 02:45:59 GMT |
I have been working on a strange project involving Yacc and SGML. What I
want to be able to do is compile a SGML Document Tag Definition into a
YACC grammar.
I have done some of this. A lot of SGML is straightforward. But then I
come to what SGML fondly refers to as exceptions. For any tag/token there
is a set of context dependant exceptions to the grammar. I get a bit lost
on how to easilly reduce this to Yacc's grammar.
For example:
<!ELEMENT foo - - (a, b*, c) >
Is the declaration for the tag/token foo. It says that foo is defined as
the sequence of tags/tokens: exactly one <a> token, any number of <b>
tokens, followed by one <c> token. SGML uses the same regular expression
syntax as Lex.
I convert the above declaration into this:
sy_foo : sy_a b_ast sy_c
;
b_plus : sy_b
| b_plus sy_b
;
b_ast : /* empty */
| b_plus
;
The problem is exceptions make the whole thing a lot more complicated.
For example:
<!ELEMENT foo - - (a, b*, c) +(d | e) >
Means that the tags/tokens <d> and <e> can occur anywhere. My first pass
was to do something like this:
sy_foo : foo_inc_ast sy_a foo_inc_ast b_ast foo_inc_ast sy_c
foo_inc_ast
;
b_plus : sy_b
| b_plus sy_b
;
b_ast : /* empty */
| b_plus
;
sy_foo_inc : sy_d
| sy_e
;
foo_inc_plus : sy_foo_inc
| foo_inc_plus sy_foo_inc
;
foo_inc_ast : /* empty */
| foo_inc_plus
;
Unfortunately that does not work. Since the sequence:
<a><b><d><b><b><c>
should also be allowed inside of <foo>. This means the grammar actually
has to be something like this:
sy_foo : foo_inc_ast sy_a foo_inc_b_ast sy_c foo_inc_ast
;
sy_foo_inc_b : sy_foo_inc
| sy_b
;
foo_inc_b_plus : sy_foo_inc_b
| foo_inc_b_plus sy_foo_inc_b
;
foo_inc_b_ast : /* empty */
| foo_inc_b_plus
;
sy_foo_inc : sy_d
| sy_e
;
foo_inc_plus : sy_foo_inc
| foo_inc_plus sy_foo_inc
;
foo_inc_ast : /* empty */
| foo_inc_plus
;
Then of course there are the problem of the exclusion exceptions. But I
think I have presented the idea.
The problem I am having is the grammar grows almost exponentially when
exceptions are fully implemented. Or am I missing some simple construct?
----
Jeffrey McArthur ATLIS Publishing
phone: (301) 210-6655 12001 Indian Creek Court
fax: (301) 210-4999 Beltsville, MD 20705
home: (410) 290-6935 email: j_mcarthur@bix.com
--
Return to the
comp.compilers page.
Search the
comp.compilers archives again.