Re: parser generator terminology

Michiel <mhelvens@gmail.com>
Sun, 6 Sep 2009 13:54:56 -0700 (PDT)

          From comp.compilers

Related articles
parser generator terminology rpboland@gmail.com (Ralph Boland) (2009-09-06)
Re: parser generator terminology mhelvens@gmail.com (Michiel) (2009-09-06)
Re: parser generator terminology DrDiettrich1@aol.com (Hans-Peter Diettrich) (2009-09-06)
Re: parser generator terminology cfc@shell01.TheWorld.com (Chris F Clark) (2009-09-06)
Re: parser generator terminology cfc@shell01.TheWorld.com (Chris F Clark) (2009-09-07)
Re: parser generator terminology haberg_20080406@math.su.se (Hans Aberg) (2009-09-07)
Re: parser generator terminology mhelvens@gmail.com (Michiel) (2009-09-07)
Re: parser generator terminology cfc@shell01.TheWorld.com (Chris F Clark) (2009-09-07)
[4 later articles]
| List of all articles for this month |

From: Michiel <mhelvens@gmail.com>
Newsgroups: comp.compilers
Date: Sun, 6 Sep 2009 13:54:56 -0700 (PDT)
Organization: Compilers Central
References: 09-09-038
Keywords: parse, design
Posted-Date: 06 Sep 2009 19:36:53 EDT

Ralph Boland wrote:


> I am designing my own parser generator tool (please don't advise me on
> the wisdom of creating yet another parser generator tool)


I believe there's a lot to be improved upon in this field. I have
plans to write a parser generator too. But it's far down my list. ;-)


I've always wanted these features in a parser generator:


* An EBNF dialect, which automatically generates lists and optional
values in the output language. Possibly the type of
list<T>/optional<T> and the method for generating them may be
specified separately from the grammar.


* Automatic location tracking. Again, with the code to do this
specified separately from the grammar. In my Flex/Bison code, I'm
finding the same couple of lines in virtually every rule.


* Parameterized grammar rules, to specify the same decoration for
several constructs. For example:


commalist(A): [ A { ',' A } ] ;
formalparameters: commalist( type identifier ) ;
tuple: '(' commalist( expression ) ')' ;


would be equivalent to:


formalparameters: [ type identifier { ',' type identifier } ] ;
tuple: '(' [ expression { ',' expression } ] ')' ;


I believe that with static analysis, such a grammar could always be
automatically transformed to a grammar without parameterized rules.


* A way to output a human-readable grammar with documentation for
grammar rules automatically extracted from the source-code.


Feel free to use or ignore these as you wish.


> ... The tokens correspond to the nonterminals of the
> grammar.


'terminals', I assume you mean.


> From here things are less clear to me.
>
> 1) Is there a name for the definition of the set of tokens; preferably
> a short name useful for naming identifiers? (I do not like regular
> definition since it implies a set of rules I do not follow.)


Not that I know of. What about 'vocabulary'?


> 2) Most parser generator tools actually use attribute grammars, which
> have attributes or semantic actions, and build abstract syntax trees.
> In my system the grammar is not attributed and instead a separate
> table is used to define the attributes and semantic actions associated
> with the grammar rules. I need a name for this table but don't know
> what to call it. Possibilities are attribute table, transformation
> table, and abstract syntax table. I don't like these though because
> they lead to long identifier names. I am considering using tree morph
> table, or just morph table. Individual entries in the morph table
> would be called morphs. This works well for naming identifiers but
> will be cryptic for anybody else but me. Can anybody make a better
> suggestion than those I mentioned?


I've never heard the word 'morph' associated with semantic actions in
a parser. I guess I'd stick to 'semantic actions', 'semantics' or
'actions'.


> 3) Most parser generator tools build parsers that call the scanner to
> get tokens. In my system the parser calls a token fetcher that used
> one or more scanners to get tokens and may then process its input
> before returning its versions of tokens back to the parser. I do not
> have a good name for the token fetcher. One possibility is to call
> the scanner the lexer and the token fetcher the scanner. Can anybody
> suggest a name for my token fetcher?


'Tokenizer'?


> 4) Some systems have specifications for defining methods for walking
> over abstract syntax trees and making further transformations or
> constructions. I am not planning to do this but am curious to know a
> good name for such a specification preferably suitable for identifier
> naming.


That would be 'visitor'. The name comes from the Visitor design
pattern [Design Patterns, GoF]. For a description of this technique,
read:


http://en.wikipedia.org/wiki/Visitor_pattern


Or read the book. I can recommend it. Concrete visitors are usually
called SomethingVisitor. For you, possibly NamingVisitor.


> All suggestions much appreciated.


Good luck with your project.


--
Michiel Helvensteijn



Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.