Re: Natural Language Parser

George Neuner <>
Tue, 29 Sep 2015 17:02:46 -0400

          From comp.compilers

Related articles
Natural Language Parser (Seima Rao) (2015-09-29)
Re: Natural Language Parser (George Neuner) (2015-09-29)
Re: Natural Language Parser (Quinn Jackson) (2015-09-29)
Re: Natural Language Parser (2015-09-30)
Re: unnatural natural language, was Natural Language Parser (George Neuner) (2015-10-01)
Re: Natural Language Parser (BGB) (2015-10-02)
Re: Natural Language Parser (Gene Wirchenko) (2015-10-06)
| List of all articles for this month |

From: George Neuner <>
Newsgroups: comp.compilers
Date: Tue, 29 Sep 2015 17:02:46 -0400
Organization: A noiseless patient Spider
References: 15-09-025
Keywords: parse
Posted-Date: 29 Sep 2015 17:20:52 EDT

On Tue, 29 Sep 2015 06:15:22 +0530, Seima Rao <>

> I am looking for a C++ API based English Language Parser.

Sorry, I don't know of one offhand. Most I seen have been written
either in Lisp or in Prolog.

The Stanford and Berkeley projects have parsers available in Java, but
they are very complex and may be difficult to port to C++ (if you even
would want to).

> The specific task I want to do on a *regular basis* is to
> parse English Language Documents and arrive
> at a Mathematical artefact.

There's little "mathematical" about natural language - it's a jumbled
heap of garbage though which we dig for nuggets of meaning. That it
can be mechanically understood well enough to be badly translated is
itself something of a miracle.

> The Mathematical artefact will (have) built a Syntax Tree
> of the English text that is input to the NLP parser.
> This is my only requirement. I dont need any semanticizing
> artefacts. So, my requirement is limited to parsing
> english language documents and arriving at a tree(
> or any other mathematical structure that binds the
> English grammar to the input document aka syntax trees).

You need to be aware that natural languages *can't* be parsed without
semantics - i.e. without considering "parts of speech".

Depending on context, the same word may represent, e.g., a verb, an
adverb, or even a (type of) noun. What part of speech the word
represents in context determines the ultimate meaning of the sentence.
Even which part of speech a word represents may be controversial. Most
human writing (and speaking) is quite imprecise: most sentences can be
parsed in more than one way, and the meanings of the various parsings
may be very different.

> My guess is that the specific C++ API based NL parser
> will be using some english dictionary inside the tool
> to do all the jobs that are advertised.
> Can readers of this forum direct me to a stable and
> active C++ API based NL parser ?

As John mentioned already, Stanford has a good NLP project. Berkeley
also has good project.

IMO, Princeton's Wordnet is the most comprehensive English language
POS database available. It has a C library interface. There are some
toy projects that demonstrate using it, but Princeton primarily has
focused on the database itself rather than on developing tools that
use it.

> I have zero experience in natural language parsing(compiling)
> and zero experience in using such tools.
> However, I intend to maintain internally the source code
> of the toolkit via whatever version control software
> that is used by the developers of the tool so that
> I am able to get regular updates and not break anything.

Most so-called "NLP" systems depend on operating within a very limited
scope - e.g., needing to "understand" only the commands and data
objects of a single program. General purpose NLP is supercomputer
territory [think IBM's Watson] ... Siri and Cortana, etc. - systems
which _appear_ to understand language - really are keyword driven and
don't actually understand anything at all.

If you intend for this to be some kind of general purpose tool, then
you need to do a LOT more research before you start.


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.