Related articles |
---|
Corpus-based parser generator? pahint@eunet.be (1995-03-07) |
Corpus-based parser generator miles@minster.york.ac.uk (1995-03-16) |
Newsgroups: | comp.compilers |
From: | miles@minster.york.ac.uk |
Keywords: | parse, theory, bibliography |
Organization: | Compilers Central |
References: | 95-03-051 |
Date: | Thu, 16 Mar 1995 17:39:12 GMT |
pahint@eunet.be (Pieter Hintjens) writes:
>Does anyone know of any research on corpus-based parser generation?
>By 'corpus-based', I mean an approach that analyses a large body of
>existing, known valid code, as compared to the traditional approach
>which starts by defining the syntactic/semantic rules.
>
>Basically I want to take a whole bunch of programs in language X,
>throw these at a tool which will then generate a parser for X. The
>end-result might not be a compiler as such, perhaps just a tool
>that can say 'Yes, that looks like an X program, except for this
>and this, which are wrong'.
Well, if you mean given a set of positive examples of some language
generated by an unknown grammar, trying to identify that grammar, then
the answer is no. Gold gave a proof of this back in 1967:
@Article{ Gold67,
Author = "E. M. Gold",
Title = "Language {I}dentification to the {L}imit",
Journal = "Information and {C}ontrol",
Volume = "10",
pages = "447--474",
Year = "1967" }
Note that finite regular languages can be identified given just
postive examples.
For the case of positive and negative examples, the target
grammar can be learnt in the limit. This means you'll need a corpus
of programs you know to be syntactically incorrect. Examples of
approaches that use both positive and negative examples abound in the
machine learning literature. As a starter, check:
@inproceedings {Youn94,
author = "S. J. Young and H. H. Shih",
booktitle = "Grammatical {I}nference and {A}pplications",
title = "{C}omputer {A}ssisted {G}rammar {C}onstruction",
place = {Alicante, Spain},
publisher = "{S}pringer {V}erlag",
pages = "282--290",
year = 1994}
@inproceedings {Wrig93,
author = "Ave Wrigley",
booktitle = "Grammatical Inference Colloquium, Essex University",
title = "Parse {T}ree {N}-grams for {S}poken {L}anguage {M}odelling",
year = 1993}
@inproceedings {Naum93,
author = "Sven Naumann and {J\"{u}rgen} Schrepp",
booktitle = "Grammatical Inference Colloquium, Essex University",
title = "Grammatical inference in {DACS}",
year = 1993}
@inproceedings {Jone93,
author = "G. J. F. Jones and H. Lloyd-Thomas and J. H. Wright",
booktitle = "Grammatical Inference Colloquium, Essex University",
title = "Adaptive {S}tatistical and {G}rammar {M}odels of {L}anguage
for {A}pplications to {S}peech {R}ecognition",
year = 1993}
@inproceedings {Osbo94b,
author = "Miles Osborne and Derek Bridge",
booktitle = "Grammatical {I}nference and {A}pplications",
title = "Learning unification-based grammars
using the {Spoken English Corpus}",
place = {Alicante, Spain},
publisher = "{S}pringer {V}erlag",
pages = "260--270",
year = 1994}
Hope this helps,
Miles
--
Return to the
comp.compilers page.
Search the
comp.compilers archives again.