Re: Shallow Parsing

smryan@garth.uucp (Steven Ryan)
17 Jun 88 19:08:37 GMT

          From comp.compilers

Related articles
Shallow Parsing smryan@garth.uucp (1988-06-13)
Re: Shallow Parsing bart@reed.dec.com (Bart Massey) (1988-06-16)
Re: Shallow Parsing smryan@garth.uucp (1988-06-17)
| List of all articles for this month |

Newsgroups: sci.lang,comp.compilers
Date: 17 Jun 88 19:08:37 GMT
References: <1114@ima.ISC.COM>
From: smryan@garth.uucp (Steven Ryan)
Organization: INTERGRAPH (APD) -- Palo Alto, CA

In article <1114@ima.ISC.COM> Bart Massey <bart@reed.dec.com> writes:
>I don't believe that the above argument about time is correct. My guess is
>that a human parsing *sentences of text* is much like a compiler parsing
>*sequences of statements*... Certainly such a behavior will occur in
>"linear time" on the number of sequences parsed. In fact, this is probably
>part of the reason why languages have statements and texts have sentences.


Yes, I'm thinking that we divide the stream into sentence-like entities.
In spoken language we use pauses to signify a break (and catch our breaths)
and variety of gestures and eye movements. In written language we endmarks
and paragraphs. Programming languages use begin/end/if/... keywords that
break out the overall structures.


Dividing the stream could be done in context free, deterministic fashion. It
may be the contents of each sentence are parsed differently.


>the semantics of the language which will determine how parse time grows with
>*program* length. But I'm not sure what this tells me -- in particular,
>both humans and compilers seem to do quite well at recovering information
>about distant tokens from memory in effectively *constant time* for the


I don't know about humans, but most compilers use a hash table for the symbol
table. In this case the time to identify one symbol out of n distinct symbols
is O(n/m) for a hash table size m. If m is larger than n, it is constant time.
However if n is large enough, the hash table linearly decreases search time,
but the search down any hash entry chain remains n (usually) or log n (for a
sophisticated table).


>Does it pay to recognize this idiom seperately from other for() loops? Why
>or why not?


Also, I'm wonderring about the tradeoff of determinstic versus nondeterministic
grammars or languages or parsing. For those unfamilar with these terms, if a
parser merrily trips through string left to right, given everything it is
seen so far, looking at the current symbol at maybe k symbols to the right, does
it know exactly which production it is in? If this true for a fixed k, the
parser is deterministic. If it requires arbritrarily large k (the lookahead),
it is nondeterminstic. Or it might be ambiguous. (Deterministic langauges
are not inherently ambiguous: each has an LR(1) grammar and LR(1) grammars are
not ambiguous.)


>[I always thought that computers use unambiguous context free grammars because
>we know how to parse them fast, and because they seem empirically to be
>convenient for humans to try to express things like computer programs. I
>also think that something like the current parsing methods is probably
>time-optimal for conventional computer architectures.


I suspect it has more to do parse generators: a parser can be automatically
generated for an LR(1) grammar, but not so for a nondeterministical language.
(Personal remark:) I think Wirth has crippled a generation of programming
language by stripping them of anything that makes life tough for a
compiler writer (my occupation if you haven't already guessed).


> Ned Irons looked at
>context-free parsing on small numbers of parallel processors and decided that


I first thought about this with respect to CDC Cyber 205 and ETA-10x vector
machines. If somebody can find a way to vectorise parsing, I think it will
open up a vast field of applications which currently appear to be scalar or
sequential.


                                                                            Thank you for your support,
                                                                                                  sm ryan
[Using Earley's algorithm you can parse ambigious grammars without trouble,
although it is much slower than LR or LALR for the grammars that the latter
two can handle. -John]
[From smryan@garth.uucp (Steven Ryan)]
--


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.