Dissertation on parsing

schoebel@informatik.uni-stuttgart.de (Thomas Schoebel-Theuer)
15 Sep 1997 21:37:01 -0400

          From comp.compilers

Related articles
Dissertation on parsing schoebel@informatik.uni-stuttgart.de (1997-09-15)
| List of all articles for this month |

From: schoebel@informatik.uni-stuttgart.de (Thomas Schoebel-Theuer)
Newsgroups: comp.compilers
Date: 15 Sep 1997 21:37:01 -0400
Organization: Informatik, Uni Stuttgart, Germany
Keywords: report, parse, available

I am happy to announce my thesis on parsing theory:

Thomas Schoebel-Theuer,
Ein Ansatz fuer eine allgemeine Theorie kontextfreier Spracherkennung,
Shaker Verlag, 1997, ISBN 3-8265-2923-5.

English translation of the title:

An approach to a general theory of context-free recognition.

Translation of the abstract:

A uniform theory for recognition of context-free languages is presented,
containing important well-known algorithms (e.g. Earley, LL(k), LC(k),
deterministic and nondeterministic LR, as well as variants) as a special
case. By introduction of the concept of a recognition tree, known
methods can be simulated, while variants can be formed by different
applications of the general compression theorem. New algorithms can be
obtained by different representations of sentential forms.

The well-introduced opinion that LL-methods were based on left derivation
in contrast to LR methods which were based on right derivation, is
disproved. It is shown that both methods are representing both kinds
of derivation, in the same way. Additionally, the contemporary classification
of parsing methods in top-down and bottom-up methods is shown to
be questionable.

From the theory, new methods can be systematically derived, where the
deterministic special case will enlarge the palette of known methods.
For example, certain ambiguous grammars can be deterministically parsed.
Further conclusions are related to the representation of all parse
trees in space O(n^2), and to connections to semantic evaluations.


Well, this abstract will not tell the English-speaking readers too much,
so here is a translation of the core thesis of the dissertation:

All directional parsing methods known to the author stem from a single
source. This source is the representation of sentential forms by trees
or by compressed trees which are compressed to a graph. All known
deterministic methods may be viewed as special case of more general
nondeterministic methods; it is much easier to derive deterministic
methods from nonterministic ones with the aid of our theory than going
the often-practised opposite way of generalizing deterministic methods
to nondeterministic ones.

Earley, LL(k) and LR(k) methods are not distinct to each other by
alternative representation of left vs. right sentential forms
(as pointed out in many text books). The difference is not with respect
to the kind of represented sentential forms: all mentioned methods
may represent left and right sentential forms depending on the usage
of a representation function. Similarly, the notions "top-down" and
"bottom-up" are not adequate for a separation criterion between LL(k)
and LR(k).
The real difference between LL(k) and LR(k) is the "point in time" where
the deterministic parsing decision is taken.

The introduction of lookahead and it's use for restriction of parsing
decisions is a concept which is orthogonal to the representation of
sentential forms; it can be applied to any kind of representation.
For example, the LALR(k) lookahead can be applied to the LL(k) class
of methods, which we would call LALL(k).

For a factored representation of all syntax trees, using arbitrary
context-free grammars, O(n^2) space is sufficient; contradictory
claims from the literature will be disproved.

The theory of context-free language recognition can be exposed by a rather
smaller theoretical kernel than was used to date. Different representations
of sentential forms can be combined with different lookahead restrictions
and with different precomputing strategies (for example the newly introduced
concept of macros). Using such combinations, not only the directional
methods known from the literature can be described, but also new parsing
methods can be created in a systematic way.


Any private person interested in the dissertation, able to read German, and
working in the area may ask me personally (per email) for a free copy of
the dissertation, I'll try to carry out such wishes. All others are advised
to buy it from Shaker Verlag :-) [YNWIM]

-- Thomas

P.S. I'm working on an english-language paper on O(n^2) space syntax
tree recognition. Please indicate your interest for proof-reading it,
if you promise to not circulate it ;-)

Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.