I rewrote Yacc

federation2005@netzero.com
Fri, 18 Sep 2015 12:48:02 -0700 (PDT)

          From comp.compilers

Related articles
I rewrote Yacc federation2005@netzero.com (2015-09-18)
Re: I rewrote Yacc federation2005@netzero.com (2015-12-16)
| List of all articles for this month |
From: federation2005@netzero.com
Newsgroups: comp.compilers
Date: Fri, 18 Sep 2015 12:48:02 -0700 (PDT)
Organization: Compilers Central
Keywords: tools, parse, available
Posted-Date: 21 Sep 2015 01:02:33 EDT

This is actually just the first stage ("normalization") of a long process,
with simplifications, better use of the language (C), and other efficiencies
and changes in the layout. Right now it's around 4000 lines + "skeleton" code.
So, for the time being, regression tests all pass and equivalency is
preserved.


I pulled back from a decision to try and collate this with the Java
version (byacc/J) deferring that to a later time. However I will be
collating this with Bison -- which has also been undergoing the
normalization process for a while now -- before further analysis and
collation takes place.


Berkeley Yacc has included a feature seen in Prolog's DCG parser: the
ability to explicitly parametrize non-terminals. It has also included
a feature ("backtracking") that apparently is paving the way to move
up from BNF to EBNF.


Bison, on the other hand, has included facility for handling the Tomita
parsing formalism (which these days apparently is called GLR). This is
something I've been working with since around 1985-1986 (when Tomita's book
was fresh off the press) -- and actually had some demo software in the
comp.compilers archive for a while in the 1990's. It will also be collated in
with the code. The LR part of this code is dramatically simpler (yet more
robust) than what's used in Yacc or Bison. So it is likely that the innards
are going to be completely gutted and replaced with my stuff.


During the intervening period (since the late 1980's) I developed a framework
(descended ultimately from the Chomsky-Schuetzenberger Theorem) that
seamlessly extends the algebra of regular and rational expressions up from
type 3 to type 2 in the Chomsky Hierarchy into a full-fledged calculus for
"context free expressions". I no longer use the Tomita framework because of
some serious deficiencies that become readily visible from the vantage point
of this latter framework. Though not published,the mathematical foundation for
it has been (2008 under LNCS 4988) -- a *complete* and *consistent* (but
second order theory) which has recently seen equivalent formulations (e.g. the
"Chomsky Algebras" of 1309.0893v1) published in the literature.


There are also serious inefficiencies underlying the LR method itself that
become readily apparent when rendered in the language of context free
expressions -- much of which not only serves as an obstacle to doing EBNF (=
BNF with regular expressions on the right hand sides of rules) but even "CBNF"
(BNF with full-fledged context free expressions on the right hand sides of
rules).


Following normalization/simplification and collation I plan on changing
everything over to a foundation based on context-free expressions. There are
actually multiple compilation methods (each of which serves as an algebraic
version of a proof to the Chomsky-Schuetzenberger Theorem) that can be used --
including one that captures the LR methodology. One of the things I've been
toying with is "semi-LR" where the underlying "characteristic FA" is left as
an NFA (the NFA -> DFA conversion of the characteristic FA is actually the
whole deal of what's going on with item sets).


There are other things I'll be experimenting with. For instance, the
compilation method I normally use (and heretofore I've done all parsers by
hand and lots of them) has a close link to LR but seems to by-pass much of
what's involved there. After a long period (> 2 decades) I have yet to prove
its equivalence with the LR method. So there will be a lot of
experimentation.


I've also intended all along to include most or all the features of Prolog's
DCG formalism (particularly the handling of operator precedences) and to even
turn everything around backwards to make Prolog itself run as a parametrized
non-deterministic push down transducer. (So I've also spent some time in the
recent past redoing the WAM formalism into a version based on the mathematical
theory known as "magmas" but that's another story for another time).


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.