Corpus-based parser generator

miles@minster.york.ac.uk
Thu, 16 Mar 1995 17:39:12 GMT

          From comp.compilers

Related articles
Corpus-based parser generator? pahint@eunet.be (1995-03-07)
Corpus-based parser generator miles@minster.york.ac.uk (1995-03-16)
| List of all articles for this month |

Newsgroups: comp.compilers
From: miles@minster.york.ac.uk
Keywords: parse, theory, bibliography
Organization: Compilers Central
References: 95-03-051
Date: Thu, 16 Mar 1995 17:39:12 GMT

pahint@eunet.be (Pieter Hintjens) writes:
>Does anyone know of any research on corpus-based parser generation?
>By 'corpus-based', I mean an approach that analyses a large body of
>existing, known valid code, as compared to the traditional approach
>which starts by defining the syntactic/semantic rules.
>
>Basically I want to take a whole bunch of programs in language X,
>throw these at a tool which will then generate a parser for X. The
>end-result might not be a compiler as such, perhaps just a tool
>that can say 'Yes, that looks like an X program, except for this
>and this, which are wrong'.


Well, if you mean given a set of positive examples of some language
generated by an unknown grammar, trying to identify that grammar, then
the answer is no. Gold gave a proof of this back in 1967:


@Article{ Gold67,
                          Author = "E. M. Gold",
                          Title = "Language {I}dentification to the {L}imit",
                          Journal = "Information and {C}ontrol",
                          Volume = "10",
pages = "447--474",
                          Year = "1967" }


Note that finite regular languages can be identified given just
postive examples.


For the case of positive and negative examples, the target
grammar can be learnt in the limit. This means you'll need a corpus
of programs you know to be syntactically incorrect. Examples of
approaches that use both positive and negative examples abound in the
machine learning literature. As a starter, check:




@inproceedings {Youn94,
author = "S. J. Young and H. H. Shih",
booktitle = "Grammatical {I}nference and {A}pplications",
title = "{C}omputer {A}ssisted {G}rammar {C}onstruction",
place = {Alicante, Spain},
                publisher = "{S}pringer {V}erlag",
                pages = "282--290",
year = 1994}


@inproceedings {Wrig93,
author = "Ave Wrigley",
booktitle = "Grammatical Inference Colloquium, Essex University",
title = "Parse {T}ree {N}-grams for {S}poken {L}anguage {M}odelling",
year = 1993}


@inproceedings {Naum93,
author = "Sven Naumann and {J\"{u}rgen} Schrepp",
booktitle = "Grammatical Inference Colloquium, Essex University",
title = "Grammatical inference in {DACS}",
year = 1993}


@inproceedings {Jone93,
author = "G. J. F. Jones and H. Lloyd-Thomas and J. H. Wright",
booktitle = "Grammatical Inference Colloquium, Essex University",
title = "Adaptive {S}tatistical and {G}rammar {M}odels of {L}anguage
for {A}pplications to {S}peech {R}ecognition",
year = 1993}




@inproceedings {Osbo94b,
author = "Miles Osborne and Derek Bridge",
booktitle = "Grammatical {I}nference and {A}pplications",
title = "Learning unification-based grammars
                using the {Spoken English Corpus}",
place = {Alicante, Spain},
                publisher = "{S}pringer {V}erlag",
                pages = "260--270",
year = 1994}


Hope this helps,


Miles
--


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.