Re: Parsing with infinite lookahead (Henry G. Baker)
Mon, 28 Feb 1994 16:38:06 GMT

          From comp.compilers

Related articles
Parsing with infinite lookahead (Matt Timmermans/MSL) (1994-02-22)
Re: Parsing with infinite lookahead (1994-02-24)
Parsing with infinite lookahead (Stephen J Bevan) (1994-02-24)
Re: Parsing with infinite lookahead (1994-02-24)
Re: Parsing with infinite lookahead (Terence Parr) (1994-02-25)
Re: Parsing with infinite lookahead corbett@lupa.Eng.Sun.COM (1994-02-26)
Re: Parsing with infinite lookahead (1994-02-27)
Re: Parsing with infinite lookahead (1994-02-28)
Re: Parsing with infinite lookahead (1994-03-01)
Re: Parsing with infinite lookahead (1994-03-02)
Re: Parsing with infinite lookahead (1994-03-24)
| List of all articles for this month |

Newsgroups: comp.compilers
From: (Henry G. Baker)
Keywords: parse, history
Organization: Compilers Central
References: 94-02-174 94-02-190
Date: Mon, 28 Feb 1994 16:38:06 GMT

Many early parsers for natural language (NL) -- e.g., English -- utilized
context free (CF) grammars souped up with additional hooks for things like
subject/verb agreement. Vaughan Pratt's (Bachelor's? Master's?) thesis
implemented such a NL parser on an 8K PDP-8. That's a total of 12K bytes,
by the way! (Hint: it didn't do any swapping either, as I recall.) I
think that C's printf takes more room than that.

Bill Woods's 'Lunar Rocks' parser was written in Lisp in standard top-down
(recursive descent) parser format, except that it could back up
arbitrarily in Prolog-ish fashion. The ability to back up arbitrarily far
enables you to handle any amount of ambiguity. I hope that you have a lot
of time on your hands though. Most Prolog CF parsers utilize
nondeterministic backup for the same purpose.

Terry Winograd's 'SHRDLU' parser was written in Carl Hewitt's Planner
system (pre-Prolog Prolog). Terry became very disenchanted with CF
grammars and backtracking for Natural Language as a result of this

Vaughan Pratt did another Lisp-based CF parser called Lingol while at MIT
in the 1970's. Lingol is a full CF parser with additional hints and
extensions on the grammar rules. Unlike Woods's parser, Lingol is
bottom-up, and generates _all_ parses in O(n^3) time and space. For
ambiguous sentences, you get a fully factored parse tree in which the
potentially infinite number of different parses are stored in cubic space.
Lingol uses the Lisp 'hints' to decide which of the ambiguous parses are
to be selected. Lingol uses no backup. I extended Lingol to handle
ambiguity in the word morphology level, which is required if there is
noise (e.g., misspellings) in the input. (I still have a version of
Lingol for Common Lisp if anyone is interested. I'll have to check with
Vaughan re copyright, however.)

If you want to do extensive error repair and try to make the most of what
a user has given you in some mangled form, a Lingol-style parser is to be
recommended. Its use of top-down goals, as well as bottom-up parsing, can
make the most of the input, even if a parse cannot be constructed which
covers the whole of the input.

It _is_ possible to do CF parsing of an arbitrary CF grammer in better
than cubic time. Les Valiant came up with a recursive hack similar in
spirit to Strassen's matrix multiplication hack, which allows better than
cubic time. I am not aware that anyone has ever implemented this, and
most people expect that the sentences would have to be enormous for this
scheme to pay off.

    -- Henry Baker


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.