Diplodicus: A parser with infinite tokens of lookahead

"tj bandrowsky" <tbandrow@unitedsoftworks.com>
24 Jul 2002 02:22:19 -0400

From comp.compilers

Related articles
*Diplodicus: A parser with infinite tokens of lookahead tbandrow@unitedsoftworks.com (tj bandrowsky)* (2002-07-24)**
Re: Diplodicus: A parser with infinite tokens of lookahead Mark.van.den.Brand@cwi.nl (M.G.J. van den Brand) (2002-07-25)
Re: Diplodicus: A parser with infinite tokens of lookahead tbandrow@unitedsoftworks.com (tj bandrowsky) (2002-07-31)
Re: Diplodicus: A parser with infinite tokens of lookahead clint@0lsen.net (Clint Olsen) (2002-07-31)
Re: Diplodicus: A parser with infinite tokens of lookahead tbandrow@unitedsoftworks.com (tj bandrowsky) (2002-08-23)

| List of all articles for this month |

From:	"tj bandrowsky" <tbandrow@unitedsoftworks.com>
Newsgroups:	comp.compilers
Date:	24 Jul 2002 02:22:19 -0400
Organization:	http://groups.google.com/
Keywords:	parse, tools
Posted-Date:	24 Jul 2002 02:22:19 EDT

Introduction:
I got myself into the peculiar situation where I simultaneously needed
an XML parser and a parser for a language product that I am writing.
I looked at yacc but all of the problems of the single token of
lookahead and ambiguous grammars really irked me. Although I'm not by
any stretch of the imagination a computer science guru, I set out to
go and see if I could not write a better parser generator. Did I
succeed, I think I did, but I need some peer review but don't know
where to get it.

Description:
I called it diplodicus because I was worried that it would perform
badly when disambiguating lots of rules.

Diplodicus right now is an object that takes:

one or more tokenizer rules, in the form of limited regular
expressions.

for example:

parser->add_lexical_rule( "rule", "[abc]" );

there may be more than one expression that satisfies each rule.

grammar rules are similarly defined.

parser->add_grammar_rule( "grule", "rule anotherrule rule" );

diplodicus does the shift reduce parse thing. it scans tokens based
on lexical rule and pushes them onto a stack. the stack is then
searched for unambiguous matches to grammar rules:

parser->add_grammar_rule( 1, "grule", "rule anotherrule rule other
stuff" );
parser->add_grammar_rule( 2, "arule", "rule anotherrule rule stuff" );
parser->add_grammar_rule( 3, "arule", "rule anotherrule rule stuff
stuff trouble stuff" );
parser->add_grammar_rule( 4, "arule", "rule anotherrule rule stuff
stuff trouble stuff and stuffing" );

are all unique rules. In diplodicus's "mind", there are is no
lookahead at at all, rules simply are not posted as satisfied until
they unambiguously match. So, when the input is,

rule anotherrule

given the above grammar, diplodicus knows there are 4 potential rules,
and keeps parsing until one and only of them can possibly match. As
each rule is satisfied, an on_reduce member function is called. This
can be overloaded to do your own stuff.

This sort of thing seems useful, and if, so, where would I post it,
announce it, etc, so that someone could have a look at it and let me
know if I am crazy or just reinventing the wheel.

If there's a product or open source project that does this, I'd like
to know.

my e-mail address is tbandrow@unitedsoftworks.com

have a nice day.

Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.

Diplodicus: A parser with infinite tokens of lookahead

"tj bandrowsky" <tbandrow@unitedsoftworks.com>24 Jul 2002 02:22:19 -0400

"tj bandrowsky" <tbandrow@unitedsoftworks.com>
24 Jul 2002 02:22:19 -0400