Re: Developing/inventing a grammar...

Chris F Clark <cfc@shell01.TheWorld.com>
12 Jul 2005 05:13:45 -0400

          From comp.compilers

Related articles
Developing/inventing a grammar... oliver@first.in-berlin.de (Oliver Bandel) (2005-07-11)
Re: Developing/inventing a grammar... cfc@shell01.TheWorld.com (Chris F Clark) (2005-07-12)
Re: Developing/inventing a grammar... makc.the.great@gmail.com (makc.the.great) (2005-07-12)
Re: Developing/inventing a grammar... torbenm@app-0.diku.dk (2005-07-12)
| List of all articles for this month |
From: Chris F Clark <cfc@shell01.TheWorld.com>
Newsgroups: comp.compilers
Date: 12 Jul 2005 05:13:45 -0400
Organization: The World Public Access UNIX, Brookline, MA
References: 05-07-044
Keywords: parse, design
Posted-Date: 12 Jul 2005 05:13:45 EDT

Oliver wrote:
> Any hints for effectively developing a grammar for a tool where I have
> an idea about how the language could/should look like?
>
> I use lex & yacc, have an idea about what the tool/language should do.
>
> But how to clearly develop a grammar for this?
>
> It's a different task to develop a grammar from scratch then using an
> already existing grammar and implementing it with lex & yacc.


Developing a grammar from scratch is probably not as different from
using an "existing grammar" as you would think. Let me explain, why
and how, as part of a method for developing a grammar from scratch.


----------------------------------------------------------------------


The first case to consider is that your new language is a variation on
some existing language. For example, you have C and you want to
invent C++, Java, or C#. In that case, you do "the obvious" of
starting with a C grammar and make changes in the areas you want you
new language to be different from the old language. (Note if you were
using Yacc++, I might recommend doing that by grammar inheritance.)


An interesting variation on this case, is where you wish to mix two
languages together, say C and SQL. Again, I would pick one of the
languages as the master template (i.e. what does the overall structure
look like), and then add the second language in as changes to the
first. (Again, Yacc++ has a form of grammar inheritance that allows
one to add parts of one grammar into another.)


----------------------------------------------------------------------


However, let us assume that you language isn't some incremental
variation on an existing language. The borrowing approach is still
worth considering.


For example, you new language will probably have some token
definitions (i.e. how identifiers are spelled, quoting conventions for
strings, how numbers are represented, comment conventions, what
special characters are used, and so forth). I would still crib those
from an existing language that is similar.


If your tokens aren't similar to some existing language, so you can
steal the definitions, I would ask, "why is your language so
different?" I would probably ask that question repeatedly, until you
language was as minimally different at the token level as possible.
And, there is good reason for that. The hardest part of writing a
parser is getting the token definitions right. It is also, the place
as a language designer, one is flying the most without a net. There
is little to guide one in how to design good tokens, except by seeing
how other languages (and compilers) have solved similar issues. (In
over a dozen years of writing grammars for languages, I have yet to
find one I couldn't borrow at least a few token definitions from a
previous language. In most, cases, almost all the tokens of any new
language are exactly the same as some other language, with perhaps one
or two minor variations.)


So, steal your token definitions from some other language (or set of
languages).


If your language has expressions, statements, and/or blocks consider
stealing those constructs also. It's not quite an important as at the
lexical level, but you can still leverage the work of others. And,
that's the goal--to borrow working results, so you don't have to make
(and perhaps fix and perhaps leave broken) your own mistakes.


If at all possible, I would also steal the overall "design" from an
existing language. The conceptual world, which may divide source
files into functions or modules or classes, probably has some
coherence to it. Leveraging that framework, leverages the learning
the users have put into that framework.


If one steals well-enough, one is really left with defining a few
rules for some specific part of the language that is actually new.
I'll talk about that after a detour.


----------------------------------------------------------------------


So, where can the borrowing process fail? The most obvious way it can
fail is when lacks an existing grammar that is at all close to the
desired language. For example, I can't think of any good examples of
grammars for makefiles. Now, one wouldn't be hard to write, and some
of the principles mentioned above could be used to help write one.
But, the point still holds. Similarly, I'm not aware of grammars for
indentation based languages, like Haskell and Python. Still in all
the above cases, there is some tool, which already parses (reads text
in) that language, and one would do well to borrow whatever parsing
technology is already in place (even if it isn't a traditonal parser).


If the overall structure of your language really isn't like anything
else, one again comes back to the same question as occurred on the
token level, "why isn't it?"


----------------------------------------------------------------------


Let's assume one has a few constructs that are unique to your
language. You will probably find that those constructs are similar to
some existing construct you can crib from. For example, if it has
alternatives, it might look like either an if-then-else or a
case-statement. If it has an internal list of things, it might look
like parameter arguments or statements within a block. Other things
can be seen as variations on declaration patterns.


One can also look for a recipe for defining one's unique construct.
One of the Yacc++ tutorials is specifically a recipe book for certain
common constructs, such as how to do lists with separators versus
lists with terminators.


----------------------------------------------------------------------


The biggest problem with the borrowing approach to language design is
simply that until one has seen enough grammars, one doesn't realize
that one is simply borrowing an idea that has already been pursued and
debugged. One solution to that problem is to borrow expertise. There
are definitely people who will hire themselves out as language design
consultants.


And, if you are going to do you learning what to borrow by the
school-of-hard-knocks approach, I would recommend starting small.
Take an existing language and add only a few features to it, before
growing a whole new language.h


By the way, the summary of the borrowing approach I was suggesting,
was to start borrowing at the lowest level (usually tokens) and find
things at each level that match (or are close to) the language you
want. The second key point was to ask yourself at each time you have
to change something that you are borrowing, "why do you need it to be
different than what you are borrowing from?"


> Where to start? Starting with the keywords and the precedences?


Yes, start there.


Hope this helps,
-Chris


*****************************************************************************
Chris Clark Internet : compres@world.std.com
Compiler Resources, Inc. Web Site : http://world.std.com/~compres
23 Bailey Rd voice : (508) 435-5016
Berlin, MA 01503 USA fax : (978) 838-0263 (24 hours)
------------------------------------------------------------------------------


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.