Re: best grammar for handling optional symbols?

Max Hailperin <max@gustavus.edu>
Sat, 16 Aug 2008 07:22:19 -0500

          From comp.compilers

Related articles
best grammar for handling optional symbols? jmichae3@yahoo.com (2008-08-15)
Re: best grammar for handling optional symbols? max@gustavus.edu (Max Hailperin) (2008-08-16)
Re: best grammar for handling optional symbols? jaluber@gmail.com (Johannes) (2008-08-17)
Re: best grammar for handling optional symbols? jmichae3@yahoo.com (2008-08-18)
Re: best grammar for handling optional symbols? cfc@shell01.TheWorld.com (Chris F Clark) (2008-08-19)
| List of all articles for this month |

From: Max Hailperin <max@gustavus.edu>
Newsgroups: comp.compilers
Date: Sat, 16 Aug 2008 07:22:19 -0500
Organization: Compilers Central
References: 08-08-021
Keywords: parse
Posted-Date: 16 Aug 2008 10:53:24 EDT
X-Usenet-Provider: http://www.giganews.com

jmichae3@yahoo.com writes:


> I've been using BNF.
> up until now, things have been OK with the optional symbols.
>
> what is the proper way to implement an optional symbol where <code40>
> is optional?
> <b> ::= <code40> "r" | "r"
> but I would prefer
> <b> ::= <a> "r"
> <a> ::= <code40> | <nullsymbol???>
>
> I don't know. I don't think this is right when it occurs 0 or 1
> times. I seem to remember there was a symbol in BNF used to represent
> the null terminal or something - it's not listed in wikipedia (lousy
> wikipedia). what was it?


I suspect you may be remembering two different things, either of which
might be useful to you:


(1) If you use your second option, then you are looking for some
conventional symbol to use for the empty string, in the place where
you have <nullsymbol???>. In grammars that are written in relatively
mathematical contexts, the conventional symbol to use for this is the
Greek letter epsilon. On the other hand, people who use the Greek
letter also tend to use an arrow symbol where you have ::=, etc. So,
what do people do who are limiting themselves to the ASCII character
set? The answer is that most adopt some particular nonterminal like
<epsilon> or <empty> or <null> to mean the same thing -- but there is
no standardization of this. Whichever one you use, you don't need to
just rely on readers to understand it, nor do you need to include a
prose explanation. Instead, you can define your nonterminal using a
production with a totally empty right-hand side:


    <nullsymbol> ::=


Of course, that may need a prose explanation too, depending on your
audience. But the good news is that it appears only once in your
grammar, and elsewhere you use the relatively self-documenting
<nullsymbol> (or whatever you call it) as opposed to having the whole
grammar strewn with productions with empty right-hand sides, as in


    <a> ::= <code40> |


where the second alternative for <a> is empty, but looks like a mistake.


(2) The other thing you may be remembering is that there are extended
versions of BNF, known as EBNF, where additional notations can be
included on the right-hand side of productions in order to indicate
such matters as optional portions. This would allow a third
alternative way of writing your grammar, conceptually similar to your
third one but a little tidier, because it doesn't require introducing
the nonterminal <a>. Unfortunately, there is again no standardization
of EBNF notation. The two most common ways of indicating optional
elements are probably enclosing them in square brackets or following
them with question marks. So, your example would wind up looking like


    <b> ::= [<code40>] "r"


or


    <b> ::= <code40>? "r"


>
> Anyway, the problem I am up against is that I have to define a symbol
> that has 18 options. fully expanded, that makes 2^18 options, and I
> have 200 more symbols to define for autocad, many with more options...
>
> [Either is OK for optional symbols. For your Autocad problem, I would
> suggest a much looser grammar that just accepts a list of symbols, and
> check semantically to knock out duplicates. That both keeps the
> grammar managable, and lets you produce much better errors, e.g.
> "duplicate foo" or "foo not allowed after bar" rather than just
> "syntax error". -John]


John's use of the phrase "produce much better errors" sounds like he
thinks you are using your grammar to produce a parser, as opposed to
just writing the grammar down as documentation. I'm not clear whether
that is in fact the case. But even if you are just writing the
grammar down for humans to read, there's a lot to what John says.


John's sugestion is particularly apt if the options can appear in any
order -- which I suspect is the case. In that case, you have far more
than 2^18 fully-expanded possibilities -- not that you would want to
write even 2^18 productions. Moreover, there is no nice trick of BNF
or EBNF that would come to your rescue.


Your approach using <a> to represent an optional <code40>, defined
using <nullsymbol> (or some other notation for the empty string) would
work OK for 18 options, which need to appear in a fixed order if used.
It would be mildly clutzy, since you'd need 18 new nonterminals like
your <a>. (I'd name them <optionalCode40> and so forth.)


If you are fortunate enough to use EBNF, but again have a required
fixed order, the clutziness would go away. You could just write the
18 options out with each in brackets or followed by a question mark,
and life would be good. Or to adopt John's phrase, you would have
kept your grammar manageable.


But nothing like that is possible if the order is variable. In that
case, even if you are writing for humans, your best bet is to escape
from the BNF or EBNF into some more general means of communication --
which for a human, might well be a natural-language explanation.


  -Max Hailperin
    Professor of Computer Science
    Gustavus Adolphus College


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.