The speed of an Earley parser?

newspub@wuggy.co.uk (Ian Woods)
17 Jun 2001 15:35:25 -0400

          From comp.compilers

Related articles
The speed of an Earley parser? newspub@wuggy.co.uk (2001-06-17)
Re: The speed of an Earley parser? sting@linguist.Dartmouth.EDU (Michael J. Fromberger) (2001-06-21)
Re: The speed of an Earley parser? joachim_d@gmx.de (Joachim Durchholz) (2001-06-21)
Re: The speed of an Earley parser? newspub@wuggy.co.uk (2001-06-28)
Re: The speed of an Earley parser? idbaxter@semdesigns.com (Ira D. Baxter) (2001-07-02)
Re: The speed of an Earley parser? news0@greynode.net (Benjamin S.Scarlet) (2001-08-06)
Re: The speed of an Earley parser? Mark.van.den.Brand@cwi.nl (M.G.J. van den Brand) (2001-08-08)
[1 later articles]
| List of all articles for this month |

From: newspub@wuggy.co.uk (Ian Woods)
Newsgroups: comp.compilers
Date: 17 Jun 2001 15:35:25 -0400
Organization: (Posted via) GTS Netcom - Public USENET Service http://pubnews.netcom.net.uk
Keywords: parse, comment
Posted-Date: 17 Jun 2001 15:35:25 EDT

I'm working on a semi-learning, semi-serious project which requires
some CF parsing. There are three major considerations on the selection
of the parsing technique to be used:


o It mustn't substantially restrict the layout of the grammar. It's
necessary for the grammar to be easily read and understood by humans,
and forcing a lot of restrictions makes their life harder.


o The parsing operation must be completed in a reasonable period -
we're not competing for speed, just the user shouldn't have died and
decomposed before the parsing is completed. Well, preferably a bit
better than that :)


o This one takes a little explanation. Imagine a language construct a
little like this:


// language foo


lang "bar"
(


    (* this is language bar! *)


)


// more lang foo


You can define a block but instead of it being some fixed language,
the language is specified as part of the 'host' languages construct.


(note, the open and close paren after 'lang "bar"' are part of bar, not foo...
this gets rid of possible ambiguity between the grammars of bar and foo.


This kind of thing, I imagine, either requires the construction of a
meta grammar which contains all of the grammars for all of the
languages with sufficient syntax for the above (which is probably a
very bad idea since it will be huge)... or I need to be able to
'switch' from one grammar to another and splice their parse trees
together (which seems a much better idea to me). This switching
operation requires that the grammar is effectively a data structure in
memory rather than encoded into a program (like most of the common
tools seem to produce - programs which can parse one specific
grammar).


The question is this:


What parsing techniques have the properties I require?


To me, Earley's parser seems to me a good bet: it has 2 of the requirements.
It's only the speed which I'm concerned about. Although I'm likely to be alive
by the time it finishes parsing a few thousand tokens against a realistic
programming language grammar, is the time taken going to be to long (in the
order of hours)?


Should I simply restrict the grammars to those able to be parsed by
LALR(k) and just follow the crowd? Or is there some other technique
which I haven't ran into yet which may be better than either?


Ian Woods.
[Go ahead and use an Earley parser. For practical languages, it'll almost
certainly be fast enough. -John]


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.