Chart Parsers--A Structural Description

linus!dartvax!uvm-gen!emerson (Tom "Oliver W. Jones" Emerson)
17 May 87 22:29:22 GMT

          From comp.compilers

Related articles
Chart Parsers--A Structural Description linus!dartvax!uvm-gen!emerson (1987-05-17)
| List of all articles for this month |

Posted-Date: 17 May 87 22:29:22 GMT
From: linus!dartvax!uvm-gen!emerson (Tom "Oliver W. Jones" Emerson)
Keywords: Natural language processing, parsing, PROLOG, parallel, efficiency
Date: 17 May 87 22:29:22 GMT
Organization: EMBA Computer Facility, Univ. of Vermont, Burlington.

Many parsing systems have been developed based on PROLOG
(i.e. Pereira's DCG and Matsumoto's BUP.) These systems
convert grammar rules into PROLOG clauses. Because of
PROLOG's backtracking nature, a majority of parsing systems
developed in PROLOG are based on backtracking. These
systems are often highly inefficient in that identical
computations are duplicated during the parsing
process. Parallel strategies can avoid this inefficiency by
keeping all the information about the parsing process in
memory. These methods are referred to as ''bookkeeping
strategies''. In this article I will give a basic overview of
one such strategy, referred to as a ''chart'' parser.

There are four main features related to chart parsers (which
are designed for highly efficient parsing of CFG's):

1) Parsing Strategy: The chart parser applies rules in a
top down, pseudo-parallel manner.

2) Uses a Chart for Bookkeeping: A chart is a data
structure which stores all the information concerning
what has been tried and what is yet to be tried.

3) Grammar with Left Recursive Rules: Because the chart
stores all information regarding the parsing process,
infinite rule application can be avoided by checking
the chart.

4) No Computational Duplication: Backtracking based
strategies duplicate many computations that are on
separate paths, while the parallel strategy makes it
possible to combine common subcalculations by the use
of bookkeeping (in this case, the chart.)

Chart Structure

The chart is composed of ''vertices'' and ''edges''. A
vertex is an integer that specifies the position of a word
in a sentence. An edge represents a constituent or a
partial constituent, and consists of a starting vertex, an
ending vertex, a label, and a remainder. Consider the
following chart representation of the sentence JOHN GAVE

| | | | | | | | | |
-----N----- -----V---- -----N---- --Det-- -----N-----

This represents the initial chart representation for the
sentence. It consists of only the edges corresponding to
the words and their appropriate lexical categories. the
label field of an edge consists of the left hand side of a
CFG rule (which could be a lexical category, as in the
example above.) The remainder is a part of a sequence of
symbols in the right hand side of a CFG rule (possibly a
null sequence.) The edge can also be represented in the
following notation:

[Starting_Vertex Label --> Ending Vertex Remainder]

For example, the representation of JOHN in the above
sentence is

[1 NP --> NOUN 2]

providing the appropriate CFG has already been defined.

A formal definition of the chart is shown below. It should
be noted that a chart contains Edges and Pending Edges. The
former stores active edges and the latter pending edges.

                                        FORMAL CHART DEFINITION*

CHART := VERTECES (a sequence of vertices)
EDGES (a set of edges)
PENDING EDGES (a set of pending edges)

EDGE := STARTING VERTEX (a vertex in the chart)
ENDING VERTEX (a vertex in the chart)
LABEL (a symbol)
REMAINDER (a sequence of symbols,
possibly empty)

VERTEX := INCOMING EDGES (The set of edges in the
chart having this vertex
as their ending vertex)
OUTGOING EDGES (The set of edges in the
chart having this vertex
as their starting vertex)

*from "Chart Parsing in Concurrent PROLOG" by Hideki
  Hirakawa, ICOT 2nd Laboratory, TR-008, ICOT May 1983

Tom "Hide the Wenches and Batten Down the Access Codes" Emerson

Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.