Re: some basic questions about parsing

torbenm@diku.dk (Torben AEgidius Mogensen)
16 Sep 1996 14:45:26 -0400

          From comp.compilers

Related articles
some basic questions about parsing mherstin@bbn.com (Michael Herstine) (1996-09-15)
Re: some basic questions about parsing salomon@silver.cs.umanitoba.ca (1996-09-16)
Re: some basic questions about parsing torbenm@diku.dk (1996-09-16)
| List of all articles for this month |

From: torbenm@diku.dk (Torben AEgidius Mogensen)
Newsgroups: comp.compilers
Date: 16 Sep 1996 14:45:26 -0400
Organization: Department of Computer Science, U of Copenhagen
References: 96-09-053
Keywords: parse

Michael Herstine <mherstin@bbn.com> writes:


>I'm reading Jim Holme's "Object Oriented Compiler Construction". In
>the chapter on parsing, he refers to the theorem that to every
>language generated by a context free grammar there corresponds a
>(possibly non-deterministic) pda that solves the recognition problem
>for that language. He notes that we don't want to deal with
>non-deterministic automata, and states that instead, we should find
>the most powerful (deterministic, I suppose) automaton that we can,
>and restrict ourselves to languages that can be recognized by such
>automata. He then introduces the shift-reduce PDA, and states without
>proof that the set of languages that shift-reduce PDAs can recognize
>is called the LR(1) languages (and intimates that the LR(1) languages
>live somewhere between regular languages and context free languages).


>I have several questions:


>1. If we seek the most powerful deterministic automaton possible,
>wouldn't that be a Turing machine? Since they can recognize unrestricted
>phrase structure grammars, a much larger set of grammars than context
>free, why don't we use them as parsers?


John Holme may have forgotten to mention the reason he doesn't want to
handle non-deterministic automata: it takes (almost) cubic time to
simulate these on a sequential machine (using the currently best known
techniques). Turing machines can run for any amount of time, even
forever, so they are not generally useful for parsing unless some
extra restrictions are put on them to guarantee fast termination.
Deterministic PDAs run in linear time, which makes them useful for
parsing. Deterministic 2-way PDA's recognize a larger class of
languages than deterministic 1-way PDA's, including some
non-contextfree languages (e.g. a^n b^n c^n) and can also be simulated
in linear time. They are, however, not used for general parsing
(i.e. as the target of parser generators). There are several reasons
for this: 1) there is no simple way of generating a 2DPDA from a
non-LR grammar. 2) the simulation time, though linear in input, also
depends (linearly) on the size of the automaton. This makes it
generally slower than 1DPDA simulation which is (more or less)
independent of the size of the automaton.


>2. What sort of grammars generate LR(1) languages?


LR(1) grammars, obviously. ;-)
Seriously, it appears that the easiest way to recognize a LR(1)
grammar is to generate a LR(1) parsing table and look for conflicts.
Note, though, that ambigous grammars are never LR(1).


>3. The production rules that he writes down for Pascal are clearly
>context free, but seem to include rules that allow non-terminals to go to
>the null string - doesn't that put them in the class of unrestricted
>phrase structure grammars (they're not allowed in context free grammars,
>are they)?


For context free grammars it is easy to show that adding empty
productions only extends the class of languages to languages that are
the unions of context free languages and the set containing the empty
string. In other words, you can eliminate empty productions by
transforming a grammar and obtain a grammar generating the same
language bar the empty string. Hence, empty productions are pretty
harmless in comtext free grammars, and they are quite convenient. So
most parser generators and compiler textbooks allow these.


Adding empty productions to context sensitive grammars (type 2
grammars in the Chomsky-hierarchy) gives them the power of
unrestricted grammars (type 1 grammars), though.


Torben Mogensen (torbenm@diku.dk)
--


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.