Re: How to construct a grammar?

Chris F Clark <cfc@shell01.TheWorld.com>
Sun, 06 Jul 2008 15:12:08 -0400

          From comp.compilers

Related articles
How to construct a grammar? michael@schuerig.de (Michael Schuerig) (2008-07-05)
Re: How to construct a grammar? Meyer-Eltz@t-online.de (Detlef Meyer-Eltz) (2008-07-05)
Re: How to construct a grammar? cfc@shell01.TheWorld.com (Chris F Clark) (2008-07-05)
Re: How to construct a grammar? michael@schuerig.de (Michael Schuerig) (2008-07-05)
Re: How to construct a grammar? cfc@shell01.TheWorld.com (Chris F Clark) (2008-07-06)
Re: How to construct a grammar? ArarghMail806@Arargh.com (2008-07-06)
Re: How to construct a grammar? quinn_jackson2004@yahoo.ca (Quinn Tyler Jackson) (2008-07-07)
| List of all articles for this month |

From: Chris F Clark <cfc@shell01.TheWorld.com>
Newsgroups: comp.compilers
Date: Sun, 06 Jul 2008 15:12:08 -0400
Organization: The World Public Access UNIX, Brookline, MA
References: 08-07-011 08-07-015 08-07-017
Keywords: parse, design
Posted-Date: 06 Jul 2008 16:21:41 EDT

Michael Schuerig <michael@schuerig.de> writes:


> If I understand you correctly, your advice (snipped in this reply) would
> be to first identify the words of the language (lexer), then work
> top-down from sentence level.


Actually, one other piece of advice that you missed, probably because
I didn't state it clearly. Look at (and for) other grammars that
might have features similar to your grammar. There are lots of solved
problems in the language world, complete with grammars.


I just want to list some idioms that are interesting and places to
look.


include files -- find a C preprocessor grammar
                this idiom is so useful we put some special features in Yacc++
                to make it easy


macros -- again look at a C preprocessor grammar


reserved words (also called keywords)
                  usually tool specific, but knowing how to integrate a symbol
                  table into the grammar is a big plus here, also can be done
                  in the lexer with a completely different technique


keywords that can also be identifiers -- find a PL/I grammar
                  there is a simple solution for this that most people don't get
                  on the other hand, it can be a dubious language feature


typed names (aka name mangling) -- look into C++
            not every feature is done in the grammar


scoped names -- Algol pioneered this
              another case where the work doesn't necessarily involve the grammar


block structure via indentation -- look at Haskell (and Python, I think)


Reading an SQL grammar, both for the "base" language and for some
particular implementation is good.


I would also read a BASIC grammar, the original language, the ANSI
standard, and for Microsoft's latest flavor if I could get them.


Oh, and before all of the above, I would read a Pascal grammar. The
language isn't perfect, but the grammar is simple and clear, designed
to be learned from.


Also, most enlightening, comparing the ANSI C grammar to one done with
precedence. It shows why precedence is much more terse (and at the
same time more clear), but you also want to know the way to do nested
operators by expressions, especially if you are going to use a tool
like ANTLR or JavaCC which doesn't implement precedence directives.


Even with all those suggestions, I think one needs to read only about
3-5 grammars to get the basic drift, especially if one has 3-5
languages which are close to the one wants to implement. In reality
there are probably half-a-dozen to a dozen idioms one needs to know.
Moreover, many of them are so obvious, that they fall-out naturally.


A good example of this is the if-then-else idiom. In theory, this
isn't LL. In fact, one solution to that problem is the use of
if-then-else-fi, which does have an LL solution. However, it turns
out that there is a natural way to hack if-then-else into a recursive
descent parser that most generators accept and people use without even
thinking about. And, there is a corresponding short-cut idiom for LR
parsers that has the same effect, although there is a precise way to
write the code to be correct for LR.


In any case, the devil is in the details, as they say. Most parts of
writing grammars isn't hard, and starting top-down is a good way to
work. It's when you get to the tricky part of the language, that you
need to find a clever solution, and those solutions often exist in a
grammar somewhere, but they haven't yet gotten cast into patterns.
It's just that there isn't a pattern community for parser writing,
partially because the community is fragmented by the different tools,
because the idioms for ANTLR are often different than those for yacc.


Well, I dragged this point out enough, still I


Hope this helps,
-Chris


******************************************************************************
Chris Clark Internet: christopher.f.clark@compiler-resources.com
Compiler Resources, Inc. or: compres@world.std.com
23 Bailey Rd Web Site: http://world.std.com/~compres
Berlin, MA 01503 voice: (508) 435-5016
USA fax: (978) 838-0263 (24 hours)
------------------------------------------------------------------------------


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.