Related articles |
---|
How to extract grammar from a program? rahulj@iitk.ac.in (Rahul Jain) (1999-02-05) |
Re: How to extract grammar from a program? cfc@world.std.com (Chris F Clark) (1999-02-10) |
Re: How to extract grammar from a program? derekross@fisheracre.freeserve.co.uk (Derek Ross) (1999-02-12) |
Re: How to extract grammar from a program? eodell@pobox.com (1999-02-15) |
Re: How to extract grammar from a program? dib@dera.gov.uk (David Bruce) (1999-02-15) |
Re: How to extract grammar from a program? spencer@cc.gatech.edu (1999-02-15) |
Re: How to extract grammar from a program? hunk@alpha1.csd.uwm.edu (1999-02-15) |
Re: How to extract grammar from a program? hunk@alpha1.csd.uwm.edu (1999-02-15) |
Re: How to extract grammar from a program? x@wins.uva.nl (1999-12-09) |
From: | hunk@alpha1.csd.uwm.edu (Mark William Hopkins) |
Newsgroups: | comp.compilers |
Date: | 15 Feb 1999 23:30:46 -0500 |
Organization: | University of Wisconsin - Milwaukee, Computing Services Division |
References: | 99-02-025 99-02-049 |
Keywords: | parse |
>There are formal theories that address this. However, their results
>are far from encouraging. The essential problem is that given a finite
>set of programs, there is a trivial regular expression which recognizes
>exactly those set of programs and no others.
Actually, this isn't the problem. The problem is that the approach
taken doesn't treat the problem as a statistical inference problem but
instead as a logical inference problem. It's the latter approach that
is characterized by the inability to avoid degenerating into the
result you cited:
>Namely, regexp: program-1 | program-2 | ... | program-n;
that is, thus, characterized by the fact that this even presents a problem.
In practice, it should not be much more difficult to statistically
model a set of structured inputs by a system of productions (Chomsky
Normal form A -> B C would be ideal) and in constructing any other
kind of statistical model.
For example: the best split of a given A into B C would be given by a
simple criterion, such as that of maximizing the product of
frequencies #(B) #(C) over all combinations for which A = BC. A more
general approach might be based on minimizing some
information-theoretic measure based on the sample set.
Actually, the method would serve as a compression method too. It's
really nothing more than a generalization of the Lempel-Ziv algorithm.
The hard part isn't the approach. It's the Machine Bottlenecks: its
Lack of Parallelism and its Lack of Memory, which humans have in
abundance. The approach for inducting the rules A -> B C is going to
take up a huge amount of memory because of all the table-keeping over
the combinations (it's k N^3, where N is the size of the largest
sample and k the number of samples). The number of samples that'll
need to be incorporated is large -- which gets to the other
bottleneck: the machine can only read and process one at a time, and
(even worse) only one step at a time, because it has no parallelism.
So methods are not an issue. The primitive state of current hardware
technology is.
Return to the
comp.compilers page.
Search the
comp.compilers archives again.