|Natural Language Parser email@example.com (Seima Rao) (2015-09-29)|
|Re: Natural Language Parser firstname.lastname@example.org (George Neuner) (2015-09-29)|
|Re: Natural Language Parser email@example.com (Quinn Jackson) (2015-09-29)|
|Re: Natural Language Parser firstname.lastname@example.org (2015-09-30)|
|Re: Natural Language Parser email@example.com (BGB) (2015-10-02)|
|Re: Natural Language Parser firstname.lastname@example.org (Gene Wirchenko) (2015-10-06)|
|Date:||Fri, 2 Oct 2015 17:53:09 -0500|
|Posted-Date:||02 Oct 2015 22:34:22 EDT|
On 9/29/2015 11:11 AM, Quinn Jackson wrote:
> On Mon, Sep 28, 2015 at 9:45 PM, Seima Rao <email@example.com> wrote:
>> I am looking for a C++ API based English Language Parser.
> I sit in front of one of these beasts every day.
> And yes, per John: "Parsing English or any natural language is very hard..."
> But hey -- someone has to do it. ;-)
IIRC, I once did a parser for an English subset, mostly by creating
the subset where each word only had a single word type.
Likewise, a finite dictionary was used, and constraints were put on
the grammatical constructs allowed. I remember it having taken some
information from Basic English, but I forget the specifics (IIRC, it
was mostly word lists and other things, but I still had to do a little
work to figure out the parsing rules for the grammar).
With this much constraining, it wasn't really all that much different
from parsing a something like a programming language, and I could use
a fairly straightforward recursive descent parser.
Previously (before this point), I had done similar with an Esperanto
variant, where one can (more or less) rely on the word suffixes to
disambiguate the word-types and desired syntax tree. I realized though
that if you know the word types from a dictionary, the suffixes (and
the use of non-English vocabulary) is unnecessary.
IIRC, some of the metadata for this, was sort of an English/Esperanto
mash-up (and there was a notation for sticking suffixes onto words).
From what I remember, what killed the effort at the time, was that I
couldn't figure out any good semantic model to map this onto. the
problem was that, language without semantics isn't particularly
You can do crude machine translation or similar, but nearly anything
"interesting" you could do would require a semantic model and some
form of rudimentary "intelligence".
other basic forms of "AI" don't really need grammar trees, either
Responding to keywords, or to temporal associations between words
(with no respect paid to grammatical structure).
So, it all goes in my "stuff that can be done but lacks any obvious
use-case" bin (sort of like me trying to find a use-case for neural-nets
which isn't better served via more conventional strategies, *, and I
still can't really make object and speech recognition work to a usable
Similarly, I have had issues when it comes to getting particularly
intelligible results from text-to-speech (my best results were by
mixing together diphone synthesis with a recorded list of common
words, my past attempts at formant synthesis generally not working so
well and producing mostly unintelligible results).
*: Given CPU power is a finite resource, and NNs tend to boil down
mostly to rather inefficiently implemented signal filters. like with
genetic programming, any "intelligent" behavior is elusive, and GP is
mostly good at either finding mediocre patterns and/or a way to break
the test (though, is at least does "ok" at finding and fine-tuning
heuristics for signal filters).
Likewise, a fixed-grammar parser wont really give sane results though if
given free-form natural language: it would be necessary to write in the
subset of the language that the parser is able to understand.
If writing for such a parser, such a subset isn't particularly difficult
apart from the tendency of one to forget about it and write phases
outside those allowed by the grammar.
Return to the
Search the comp.compilers archives again.