Re: parser generator terminology

Michiel <mhelvens@gmail.com>
Sun, 6 Sep 2009 13:54:56 -0700 (PDT)

From comp.compilers

Related articles
parser generator terminology rpboland@gmail.com (Ralph Boland) (2009-09-06)
*Re: parser generator terminology mhelvens@gmail.com (Michiel)* (2009-09-06)**
Re: parser generator terminology DrDiettrich1@aol.com (Hans-Peter Diettrich) (2009-09-06)
Re: parser generator terminology cfc@shell01.TheWorld.com (Chris F Clark) (2009-09-06)
Re: parser generator terminology cfc@shell01.TheWorld.com (Chris F Clark) (2009-09-07)
Re: parser generator terminology haberg_20080406@math.su.se (Hans Aberg) (2009-09-07)
Re: parser generator terminology mhelvens@gmail.com (Michiel) (2009-09-07)
Re: parser generator terminology cfc@shell01.TheWorld.com (Chris F Clark) (2009-09-07)
[4 later articles]

| List of all articles for this month |

From:	Michiel <mhelvens@gmail.com>
Newsgroups:	comp.compilers
Date:	Sun, 6 Sep 2009 13:54:56 -0700 (PDT)
Organization:	Compilers Central
References:	09-09-038
Keywords:	parse, design
Posted-Date:	06 Sep 2009 19:36:53 EDT

Ralph Boland wrote:

> I am designing my own parser generator tool (please don't advise me on
> the wisdom of creating yet another parser generator tool)

I believe there's a lot to be improved upon in this field. I have
plans to write a parser generator too. But it's far down my list. ;-)

I've always wanted these features in a parser generator:

* An EBNF dialect, which automatically generates lists and optional
values in the output language. Possibly the type of
list<T>/optional<T> and the method for generating them may be
specified separately from the grammar.

* Automatic location tracking. Again, with the code to do this
specified separately from the grammar. In my Flex/Bison code, I'm
finding the same couple of lines in virtually every rule.

* Parameterized grammar rules, to specify the same decoration for
several constructs. For example:

commalist(A): [ A { ',' A } ] ;
formalparameters: commalist( type identifier ) ;
tuple: '(' commalist( expression ) ')' ;

would be equivalent to:

formalparameters: [ type identifier { ',' type identifier } ] ;
tuple: '(' [ expression { ',' expression } ] ')' ;

I believe that with static analysis, such a grammar could always be
automatically transformed to a grammar without parameterized rules.

* A way to output a human-readable grammar with documentation for
grammar rules automatically extracted from the source-code.

Feel free to use or ignore these as you wish.

> ... The tokens correspond to the nonterminals of the
> grammar.

'terminals', I assume you mean.

> From here things are less clear to me.
>
> 1) Is there a name for the definition of the set of tokens; preferably
> a short name useful for naming identifiers? (I do not like regular
> definition since it implies a set of rules I do not follow.)

Not that I know of. What about 'vocabulary'?

> 2) Most parser generator tools actually use attribute grammars, which
> have attributes or semantic actions, and build abstract syntax trees.
> In my system the grammar is not attributed and instead a separate
> table is used to define the attributes and semantic actions associated
> with the grammar rules. I need a name for this table but don't know
> what to call it. Possibilities are attribute table, transformation
> table, and abstract syntax table. I don't like these though because
> they lead to long identifier names. I am considering using tree morph
> table, or just morph table. Individual entries in the morph table
> would be called morphs. This works well for naming identifiers but
> will be cryptic for anybody else but me. Can anybody make a better
> suggestion than those I mentioned?

I've never heard the word 'morph' associated with semantic actions in
a parser. I guess I'd stick to 'semantic actions', 'semantics' or
'actions'.

> 3) Most parser generator tools build parsers that call the scanner to
> get tokens. In my system the parser calls a token fetcher that used
> one or more scanners to get tokens and may then process its input
> before returning its versions of tokens back to the parser. I do not
> have a good name for the token fetcher. One possibility is to call
> the scanner the lexer and the token fetcher the scanner. Can anybody
> suggest a name for my token fetcher?

'Tokenizer'?

> 4) Some systems have specifications for defining methods for walking
> over abstract syntax trees and making further transformations or
> constructions. I am not planning to do this but am curious to know a
> good name for such a specification preferably suitable for identifier
> naming.

That would be 'visitor'. The name comes from the Visitor design
pattern [Design Patterns, GoF]. For a description of this technique,
read:

http://en.wikipedia.org/wiki/Visitor_pattern

Or read the book. I can recommend it. Concrete visitors are usually
called SomethingVisitor. For you, possibly NamingVisitor.

> All suggestions much appreciated.

Good luck with your project.

--
Michiel Helvensteijn

Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.

Re: parser generator terminology

Michiel <mhelvens@gmail.com>Sun, 6 Sep 2009 13:54:56 -0700 (PDT)

Michiel <mhelvens@gmail.com>
Sun, 6 Sep 2009 13:54:56 -0700 (PDT)