Related articles |
---|
The speed of an Earley parser? newspub@wuggy.co.uk (2001-06-17) |
Re: The speed of an Earley parser? sting@linguist.Dartmouth.EDU (Michael J. Fromberger) (2001-06-21) |
Re: The speed of an Earley parser? joachim_d@gmx.de (Joachim Durchholz) (2001-06-21) |
Re: The speed of an Earley parser? newspub@wuggy.co.uk (2001-06-28) |
Re: The speed of an Earley parser? idbaxter@semdesigns.com (Ira D. Baxter) (2001-07-02) |
Re: The speed of an Earley parser? news0@greynode.net (Benjamin S.Scarlet) (2001-08-06) |
Re: The speed of an Earley parser? Mark.van.den.Brand@cwi.nl (M.G.J. van den Brand) (2001-08-08) |
[1 later articles] |
From: | newspub@wuggy.co.uk (Ian Woods) |
Newsgroups: | comp.compilers |
Date: | 17 Jun 2001 15:35:25 -0400 |
Organization: | (Posted via) GTS Netcom - Public USENET Service http://pubnews.netcom.net.uk |
Keywords: | parse, comment |
Posted-Date: | 17 Jun 2001 15:35:25 EDT |
I'm working on a semi-learning, semi-serious project which requires
some CF parsing. There are three major considerations on the selection
of the parsing technique to be used:
o It mustn't substantially restrict the layout of the grammar. It's
necessary for the grammar to be easily read and understood by humans,
and forcing a lot of restrictions makes their life harder.
o The parsing operation must be completed in a reasonable period -
we're not competing for speed, just the user shouldn't have died and
decomposed before the parsing is completed. Well, preferably a bit
better than that :)
o This one takes a little explanation. Imagine a language construct a
little like this:
// language foo
lang "bar"
(
(* this is language bar! *)
)
// more lang foo
You can define a block but instead of it being some fixed language,
the language is specified as part of the 'host' languages construct.
(note, the open and close paren after 'lang "bar"' are part of bar, not foo...
this gets rid of possible ambiguity between the grammars of bar and foo.
This kind of thing, I imagine, either requires the construction of a
meta grammar which contains all of the grammars for all of the
languages with sufficient syntax for the above (which is probably a
very bad idea since it will be huge)... or I need to be able to
'switch' from one grammar to another and splice their parse trees
together (which seems a much better idea to me). This switching
operation requires that the grammar is effectively a data structure in
memory rather than encoded into a program (like most of the common
tools seem to produce - programs which can parse one specific
grammar).
The question is this:
What parsing techniques have the properties I require?
To me, Earley's parser seems to me a good bet: it has 2 of the requirements.
It's only the speed which I'm concerned about. Although I'm likely to be alive
by the time it finishes parsing a few thousand tokens against a realistic
programming language grammar, is the time taken going to be to long (in the
order of hours)?
Should I simply restrict the grammars to those able to be parsed by
LALR(k) and just follow the crowd? Or is there some other technique
which I haven't ran into yet which may be better than either?
Ian Woods.
[Go ahead and use an Earley parser. For practical languages, it'll almost
certainly be fast enough. -John]
Return to the
comp.compilers page.
Search the
comp.compilers archives again.