Re: dynamic parsers re parser generators

sarah@telergy.com (Sarah Thompson)
6 Apr 2002 22:47:34 -0500

          From comp.compilers

Related articles
dynamic parsers re parser generators jf-sebastian@attbi.com (2002-03-31)
Re: dynamic parsers re parser generators sarah@telergy.com (2002-04-06)
Re: dynamic parsers re parser generators djowel@gmx.co.uk (joel de guzman) (2002-04-13)
| List of all articles for this month |

From: sarah@telergy.com (Sarah Thompson)
Newsgroups: comp.compilers
Date: 6 Apr 2002 22:47:34 -0500
Organization: http://groups.google.com/
References: 02-03-206
Keywords: parse, design
Posted-Date: 06 Apr 2002 22:47:34 EST

> I used dynamic as a term to describe the behaviour of a parser that
> can adapt to its input. One example of such an input is a phrase in
> the grammar describing the input that follows. Example (although these
> examples may refer to the lexical analysis part of a parsing process):


[snip]


> Question:
> Is there any parser generator/lexical analyzer that delivers this
> functionality and produces C/C++ (portable) code?


The Spirit library (http://spirit.sourceforge.net/) can do this.
However, it's not really a 'full strength' lexer/parser in the sense
of the more familiar tools like lex/yacc/pccts - there is no separate
lexer (i.e. the terminal symbols are characters). Syntactically
grammars look like EBNF, but this is achieved using clever operator
overloading in C++, so no preprocessor is needed. It is also possible
to dynamically create, and to some extent modify, parsers at run time.


I have a personal project under way that addresses this. It is a
portable ISO C++ lexer/parser library that, like Spirit, doesn't use a
separate preprocessor, and uses a lot of C++ operator overloading
tricks to achieve a nice syntax. However, internally, its structure is
quite different from Spirit. It has a DFA based lexer engine that
supports reasonably sophisticated regex matching, but is extremely
quick. This is followed up by a recursive descent parser with LL(k)
capabilities that is designed to generate an abstract syntax tree. The
recursive descent grammars need to be fairly carefully coded (the
usual left recursion elimination and left factoring), but when you do
so performance is very acceptable. Backtracking is fully supported,
with three kinds of alternation available (match first, match
shortest, match longest). This isn't strictly necessary for typical
programming language grammars, but I have an eye on also using this
thing for NL parsing, where that kind of trick can be useful.
EBNF/regex-like constructs in grammars (such as +, *, ?, etc) are also
supported.


I have a lot more to do before I'd regard it as releasable
unfortunately. My signal slot library
(http://sigslot.sourceforge.net/) has been taking up most of my
recreational programming time recently. Nevertheless, the library
already lexes, parses and generates ASTs, but this is going to need a
fair bit of testing before I'd be happy to inflict it on the world. I
also want to extend my AST class to give it STL-like recursive
iterators, and possibly also a generic attribute list mechanism.


The lexer isn't realistically extensible at run time, because it needs
to precompile its state table to achive linear time complexity. Since
the parser need not necessarily be run across the entire source in one
go (it can be used incrementally), it would be possible to modify or
even replace the parser between code fragments.


Sarah


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.