Re: partial parsing

daw@mozart.cs.berkeley.edu (David Wagner)
30 Apr 2001 22:21:46 -0400

          From comp.compilers

Related articles
partial parsing Sonali@SetuIndia.com (Sonali Chitnis) (2001-04-26)
Re: partial parsing daw@mozart.cs.berkeley.edu (2001-04-30)
Re: partial parsing rkrayhawk@aol.com (2001-05-03)
| List of all articles for this month |

From: daw@mozart.cs.berkeley.edu (David Wagner)
Newsgroups: comp.compilers
Date: 30 Apr 2001 22:21:46 -0400
Organization: University of California, Berkeley
References: 01-04-134
Keywords: parse
Posted-Date: 30 Apr 2001 22:21:45 EDT
Originator: daw@mozart.cs.berkeley.edu (David Wagner)

Sonali Chitnis wrote:
>I am working on an editor whose files adhere to a specific grammar.
>My requirement is the Editor should also be able to parse the file
>when the file is yet being created and not only after the file is
>written compeletly.


Although I am not expert in this area, I know that there are many
papers on this topic (look up incremental parsing, etc.), and you
might find them interesting to read.


Nonetheless, despite my infamiliarity with the area, let me mention
one beautiful idea from the literature. I don't know whether it is
practical, but I want to mention it merely because it is so pretty.


In the usual parsing problem, we are given a context-free language L
and a word w, and we must decide whether w is in L.


In your partial parsing problem, I think you are asking us to imagine
a user who might write a program from start to end, and the goal is to
determine whether a partial program is a possible prefix of some legal
program or not. If I got that right, the partial parsing problem can
be formalized in the following way: given a cfl L and a word w, decide
whether there is any word w' in L so that w is a prefix of w'. In
other words, we are asking whether there exists any word in L matching
the regular expression w.Sigma^*, where Sigma is the alphabet.


Thus, we can consider the following problem:
    The "intersection problem": Given a context-free language L and
    a regular language R, decide whether L intersect R is non-empty.
Note that partial parsing can be reduced to this problem, if we let
R be the language recognized by w.Sigma^*; also, the usual parsing
can be reduced to this problem by taking R = {w} (so that R is the
language containing just the word w). Note that in both cases, R
forms a regular language.


Now the interesting part is that most parsing algorithms for general
context-free languages seem to apply not only to the usual parsing
problem but can also be applied to the "intersection problem" stated
above as well. Thus, we can take CYK parsing or Tomita parsing and
use them to solve the partial parsing problem by reducing to a special
case of the "intersection problem".


Or, you can note that the "intersection problem" is easily solved
by methods taught in basic automata theory classes. In particular,
we know that L' := L intersect R is a context-free language if L is
context-free and R is regular, and L' can be effectively computed.
Moreover, we know that given any context-free language L' we can
easily test whether L' is empty or not. These two basic results from
language theory yield algorithms for the "intersection problem".


We can also use this framework to consider more general types of
partial parsing, for instance, where one edits in the middle of the
program. This example can be expressed as some regular expression
like v.Sigma^*.w, where v is the part before what is currently edited
and w is the part after.


Endnote: Strictly speaking, I am describing above recognition problems
(where the answer is yes/no), not parsing problems (where one should
also output a parse tree), but everything I write about recognition
problems generalizes to full parsing as well.


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.