Re: Natural Language Parser

BGB <cr88192@hotmail.com>
Fri, 2 Oct 2015 17:53:09 -0500

          From comp.compilers

Related articles
Natural Language Parser seimarao@gmail.com (Seima Rao) (2015-09-29)
Re: Natural Language Parser gneuner2@comcast.net (George Neuner) (2015-09-29)
Re: Natural Language Parser thothic.quinn@gmail.com (Quinn Jackson) (2015-09-29)
Re: Natural Language Parser rpw3@rpw3.org (2015-09-30)
Re: Natural Language Parser cr88192@hotmail.com (BGB) (2015-10-02)
Re: Natural Language Parser genew@telus.net (Gene Wirchenko) (2015-10-06)
| List of all articles for this month |

From: BGB <cr88192@hotmail.com>
Newsgroups: comp.compilers
Date: Fri, 2 Oct 2015 17:53:09 -0500
Organization: albasani.net
References: 15-09-025 15-09-031
Keywords: parse
Posted-Date: 02 Oct 2015 22:34:22 EDT

On 9/29/2015 11:11 AM, Quinn Jackson wrote:
> On Mon, Sep 28, 2015 at 9:45 PM, Seima Rao <seimarao@gmail.com> wrote:
>>
>> I am looking for a C++ API based English Language Parser.
>
> I sit in front of one of these beasts every day.
>
> And yes, per John: "Parsing English or any natural language is very hard..."
>
> But hey -- someone has to do it. ;-)


IIRC, I once did a parser for an English subset, mostly by creating
the subset where each word only had a single word type.


Likewise, a finite dictionary was used, and constraints were put on
the grammatical constructs allowed. I remember it having taken some
information from Basic English, but I forget the specifics (IIRC, it
was mostly word lists and other things, but I still had to do a little
work to figure out the parsing rules for the grammar).


With this much constraining, it wasn't really all that much different
from parsing a something like a programming language, and I could use
a fairly straightforward recursive descent parser.


Previously (before this point), I had done similar with an Esperanto
variant, where one can (more or less) rely on the word suffixes to
disambiguate the word-types and desired syntax tree. I realized though
that if you know the word types from a dictionary, the suffixes (and
the use of non-English vocabulary) is unnecessary.


IIRC, some of the metadata for this, was sort of an English/Esperanto
mash-up (and there was a notation for sticking suffixes onto words).




From what I remember, what killed the effort at the time, was that I
couldn't figure out any good semantic model to map this onto. the
problem was that, language without semantics isn't particularly
useful.


You can do crude machine translation or similar, but nearly anything
"interesting" you could do would require a semantic model and some
form of rudimentary "intelligence".


other basic forms of "AI" don't really need grammar trees, either
Responding to keywords, or to temporal associations between words
(with no respect paid to grammatical structure).


So, it all goes in my "stuff that can be done but lacks any obvious
use-case" bin (sort of like me trying to find a use-case for neural-nets
which isn't better served via more conventional strategies, *, and I
still can't really make object and speech recognition work to a usable
level).


Similarly, I have had issues when it comes to getting particularly
intelligible results from text-to-speech (my best results were by
mixing together diphone synthesis with a recorded list of common
words, my past attempts at formant synthesis generally not working so
well and producing mostly unintelligible results).


*: Given CPU power is a finite resource, and NNs tend to boil down
mostly to rather inefficiently implemented signal filters. like with
genetic programming, any "intelligent" behavior is elusive, and GP is
mostly good at either finding mediocre patterns and/or a way to break
the test (though, is at least does "ok" at finding and fine-tuning
heuristics for signal filters).




Likewise, a fixed-grammar parser wont really give sane results though if
given free-form natural language: it would be necessary to write in the
subset of the language that the parser is able to understand.


If writing for such a parser, such a subset isn't particularly difficult
apart from the tendency of one to forget about it and write phases
outside those allowed by the grammar.



Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.