|The Tomita Parsing Algorithm (LR(k) with Dynamic Programming) firstname.lastname@example.org (1993-05-23)|
|Re: The Tomita Parsing Algorithm (LR(k) with Dynamic Programming) Bernard.Lang@inria.fr (1993-05-24)|
|Summary:||Dynamic programming parsing of CF languages and extensions|
|Organization:||INRIA - Rocquencourt|
|Date:||Mon, 24 May 1993 15:17:27 GMT|
The fact that "Tomita's" algorithm is a dynamic programming algorithm has
been known for about 15 years (and probably before then, though less
formally). It is essentially a variant of Earley's parsing algorithm
(which had none of Tomita's limitation) and which has been formally shown
to be very similar to the CYK tabular algorithm (see standard textbooks
for all of these) by Graham-Harrison and Ruzzo around 1978, I believe in
ACM-TOPLAS. The latter has been know as a dynamic programming technique
probably since it was published in 1965.
Almost all these algorithms, and later ones (not including Tomita's),
run at worst in time n^3, where n is the length of the input string, and
run often linearly on a large domain (CKY is always n^3).
I published a general dynamic programming (though not called that way)
interpretation of non-deterministic pushdown automata in 1974
(Deterministic Techniques for efficient non-deterministic Parsers, Proc.
of 2nd Coll. on Automata, Languages and Programming, Springer LNCS 14, pp
255-269). This applies in particular to non-deterministic PDAs produced
by the LR(k) construction, and yields an n^3 algorithm, very similar in
essence to the algorithm of Tomita, but without its limitations (works on
all CF grammars, in time at worst n^3, and often faster: for example it
reduces to linear LR(k) parsing when the grammar is LR(k) and the PDA
connstruction technique is LR(k) - but now this seems obvious).
This algorithm was developed formally, but not implemented then: the
interest was limited by the power of the computers of the time. However a
direct construction based on precedence parsing was experimented in
1971-72 (SIGPLAN Notices 6-12, Dec. 1972).
The 1974 algorithm did not produce a shared forest or all possible
parse-trees, but a CF grammar of all leftmost parses (a leftmost parse
being a string of rules used for reduction). It turns out that this CF
grammar is isomophic to a parse forest, which leads to more recent results
indicating that parse forests are nothing but CF grammars (more precisely
a specialization to the parsed input of the original CF grammar --
specialization in the sense of partial evaluation).
Another dynamic programming construction was proposed (Information and
Control 13, pp 186-206) by Aho-Hopcroft and Ullman in 1968, but like the
CKY construction it run in n^3.
These techniques and results extend to a variety of non-CF languages,
and also to the evaluation of logic programs (Horn Clauses), as indicated
by the early results of Pereira and Warren on Earley deduction (ACL'83). A
special case of this are the so called "magic set" techniques used in
Note that breadth-first search is not stricly necessary to handle
non-determinism (cf Sheil, ACL 1976) and all one really needs is a fair
computation strategy. This is particularly important for extensions to
Other constructions have been published which do not have Tomita's
limitations and do have adequate complexity. In the case of LR automata,
and more generally of bottom-up PDAs, the constructions can be simplified.
Several people have independently come up with variants based on the use
of memo functions, but that is essentially a form of dynamic programming.
The algorithm of Tomita, and other dynamic programming algorithms are in
particular discussed in the proceedings of the first International
Workshop on Parsing Technology (1989), 2 volumes with a different name,
both edited by Tomita, and published by Kluwer. They contain some papers
that discuss the limitations of the algorithm, and possible remedies.
There are many other publications on the topic.
The ability to deal with cyclic grammar is probably useless in most
applications. It can however be useful when dealing with some problems of
error recovery on ill-formed input.
Part of my own published contribution to the topic is available (in
compressed format) by ftp at
in directory: INRIA/publication/ChLoE
read the file 00-Index to get detailed contents of the directory.
I do not have an up-to-date bibliography handy, but many references may
be found in these papers.
Return to the
Search the comp.compilers archives again.