15 Sep 1997 21:37:01 -0400

Related articles |
---|

Dissertation on parsing schoebel@informatik.uni-stuttgart.de (1997-09-15) |

From: | schoebel@informatik.uni-stuttgart.de (Thomas Schoebel-Theuer) |

Newsgroups: | comp.compilers |

Date: | 15 Sep 1997 21:37:01 -0400 |

Organization: | Informatik, Uni Stuttgart, Germany |

Keywords: | report, parse, available |

I am happy to announce my thesis on parsing theory:

Thomas Schoebel-Theuer,

Ein Ansatz fuer eine allgemeine Theorie kontextfreier Spracherkennung,

Shaker Verlag, 1997, ISBN 3-8265-2923-5.

English translation of the title:

An approach to a general theory of context-free recognition.

Translation of the abstract:

A uniform theory for recognition of context-free languages is presented,

containing important well-known algorithms (e.g. Earley, LL(k), LC(k),

deterministic and nondeterministic LR, as well as variants) as a special

case. By introduction of the concept of a recognition tree, known

methods can be simulated, while variants can be formed by different

applications of the general compression theorem. New algorithms can be

obtained by different representations of sentential forms.

The well-introduced opinion that LL-methods were based on left derivation

in contrast to LR methods which were based on right derivation, is

disproved. It is shown that both methods are representing both kinds

of derivation, in the same way. Additionally, the contemporary classification

of parsing methods in top-down and bottom-up methods is shown to

be questionable.

From the theory, new methods can be systematically derived, where the

deterministic special case will enlarge the palette of known methods.

For example, certain ambiguous grammars can be deterministically parsed.

Further conclusions are related to the representation of all parse

trees in space O(n^2), and to connections to semantic evaluations.

=======

Well, this abstract will not tell the English-speaking readers too much,

so here is a translation of the core thesis of the dissertation:

All directional parsing methods known to the author stem from a single

source. This source is the representation of sentential forms by trees

or by compressed trees which are compressed to a graph. All known

deterministic methods may be viewed as special case of more general

nondeterministic methods; it is much easier to derive deterministic

methods from nonterministic ones with the aid of our theory than going

the often-practised opposite way of generalizing deterministic methods

to nondeterministic ones.

Earley, LL(k) and LR(k) methods are not distinct to each other by

alternative representation of left vs. right sentential forms

(as pointed out in many text books). The difference is not with respect

to the kind of represented sentential forms: all mentioned methods

may represent left and right sentential forms depending on the usage

of a representation function. Similarly, the notions "top-down" and

"bottom-up" are not adequate for a separation criterion between LL(k)

and LR(k).

The real difference between LL(k) and LR(k) is the "point in time" where

the deterministic parsing decision is taken.

The introduction of lookahead and it's use for restriction of parsing

decisions is a concept which is orthogonal to the representation of

sentential forms; it can be applied to any kind of representation.

For example, the LALR(k) lookahead can be applied to the LL(k) class

of methods, which we would call LALL(k).

For a factored representation of all syntax trees, using arbitrary

context-free grammars, O(n^2) space is sufficient; contradictory

claims from the literature will be disproved.

The theory of context-free language recognition can be exposed by a rather

smaller theoretical kernel than was used to date. Different representations

of sentential forms can be combined with different lookahead restrictions

and with different precomputing strategies (for example the newly introduced

concept of macros). Using such combinations, not only the directional

methods known from the literature can be described, but also new parsing

methods can be created in a systematic way.

======

Any private person interested in the dissertation, able to read German, and

working in the area may ask me personally (per email) for a free copy of

the dissertation, I'll try to carry out such wishes. All others are advised

to buy it from Shaker Verlag :-) [YNWIM]

-- Thomas

P.S. I'm working on an english-language paper on O(n^2) space syntax

tree recognition. Please indicate your interest for proof-reading it,

if you promise to not circulate it ;-)

--

Post a followup to this message

Return to the
comp.compilers page.

Search the
comp.compilers archives again.