|Shallow Parsing email@example.com (1988-06-13)|
|Re: Shallow Parsing firstname.lastname@example.org (Bart Massey) (1988-06-16)|
|Re: Shallow Parsing email@example.com (1988-06-17)|
|Date:||17 Jun 88 19:08:37 GMT|
|From:||firstname.lastname@example.org (Steven Ryan)|
|Organization:||INTERGRAPH (APD) -- Palo Alto, CA|
In article <1114@ima.ISC.COM> Bart Massey <email@example.com> writes:
>I don't believe that the above argument about time is correct. My guess is
>that a human parsing *sentences of text* is much like a compiler parsing
>*sequences of statements*... Certainly such a behavior will occur in
>"linear time" on the number of sequences parsed. In fact, this is probably
>part of the reason why languages have statements and texts have sentences.
Yes, I'm thinking that we divide the stream into sentence-like entities.
In spoken language we use pauses to signify a break (and catch our breaths)
and variety of gestures and eye movements. In written language we endmarks
and paragraphs. Programming languages use begin/end/if/... keywords that
break out the overall structures.
Dividing the stream could be done in context free, deterministic fashion. It
may be the contents of each sentence are parsed differently.
>the semantics of the language which will determine how parse time grows with
>*program* length. But I'm not sure what this tells me -- in particular,
>both humans and compilers seem to do quite well at recovering information
>about distant tokens from memory in effectively *constant time* for the
I don't know about humans, but most compilers use a hash table for the symbol
table. In this case the time to identify one symbol out of n distinct symbols
is O(n/m) for a hash table size m. If m is larger than n, it is constant time.
However if n is large enough, the hash table linearly decreases search time,
but the search down any hash entry chain remains n (usually) or log n (for a
>Does it pay to recognize this idiom seperately from other for() loops? Why
>or why not?
Also, I'm wonderring about the tradeoff of determinstic versus nondeterministic
grammars or languages or parsing. For those unfamilar with these terms, if a
parser merrily trips through string left to right, given everything it is
seen so far, looking at the current symbol at maybe k symbols to the right, does
it know exactly which production it is in? If this true for a fixed k, the
parser is deterministic. If it requires arbritrarily large k (the lookahead),
it is nondeterminstic. Or it might be ambiguous. (Deterministic langauges
are not inherently ambiguous: each has an LR(1) grammar and LR(1) grammars are
>[I always thought that computers use unambiguous context free grammars because
>we know how to parse them fast, and because they seem empirically to be
>convenient for humans to try to express things like computer programs. I
>also think that something like the current parsing methods is probably
>time-optimal for conventional computer architectures.
I suspect it has more to do parse generators: a parser can be automatically
generated for an LR(1) grammar, but not so for a nondeterministical language.
(Personal remark:) I think Wirth has crippled a generation of programming
language by stripping them of anything that makes life tough for a
compiler writer (my occupation if you haven't already guessed).
> Ned Irons looked at
>context-free parsing on small numbers of parallel processors and decided that
I first thought about this with respect to CDC Cyber 205 and ETA-10x vector
machines. If somebody can find a way to vectorise parsing, I think it will
open up a vast field of applications which currently appear to be scalar or
Thank you for your support,
[Using Earley's algorithm you can parse ambigious grammars without trouble,
although it is much slower than LR or LALR for the grammars that the latter
two can handle. -John]
[From firstname.lastname@example.org (Steven Ryan)]
Return to the
Search the comp.compilers archives again.