Re: The speed of an Earley parser?

"M.G.J. van den Brand" <Mark.van.den.Brand@cwi.nl>
8 Aug 2001 01:14:28 -0400

From comp.compilers

Related articles
The speed of an Earley parser? newspub@wuggy.co.uk (2001-06-17)
Re: The speed of an Earley parser? sting@linguist.Dartmouth.EDU (Michael J. Fromberger) (2001-06-21)
Re: The speed of an Earley parser? joachim_d@gmx.de (Joachim Durchholz) (2001-06-21)
Re: The speed of an Earley parser? newspub@wuggy.co.uk (2001-06-28)
Re: The speed of an Earley parser? idbaxter@semdesigns.com (Ira D. Baxter) (2001-07-02)
Re: The speed of an Earley parser? news0@greynode.net (Benjamin S.Scarlet) (2001-08-06)
*Re: The speed of an Earley parser? Mark.van.den.Brand@cwi.nl (M.G.J. van den Brand)* (2001-08-08)**
Re: The speed of an Earley parser? holzmueller@ics-ag.de (2001-08-15)

| List of all articles for this month |

From:	"M.G.J. van den Brand" <Mark.van.den.Brand@cwi.nl>
Newsgroups:	comp.compilers
Date:	8 Aug 2001 01:14:28 -0400
Organization:	CWI, Amsterdam
References:	01-06-041 01-08-027
Keywords:	parse, performance
Posted-Date:	08 Aug 2001 01:14:28 EDT

> > The question is this:
> >
> > What parsing techniques have the properties I require?

A quite powerful parsing technique is Generalized LR parsing, there
are various papers on Generalized LR parsing and various
algorithms/implementations. I think that the famoust one is by
Tomita. Although other people came up with various improvements and
extensions, e.g., Rekers and Visser. Lately I came across a very nice
and clear description of GLR parsing by Scott and Johnstone. I will
add some references at the end of this message.

>
> >
> > To me, Earley's parser seems to me a good bet: it has 2 of the
> > requirements. It's only the speed which I'm concerned about. Although I'm
> > likely to be alive by the time it finishes parsing a few thousand tokens
> > against a realistic programming language grammar, is the time taken going
> > to be to long (in the order of hours)?

We use a GLR implementatin in a system (the ASF+SDF Meta-Environment,
http://www.cwi.nl/projects/MetaEnv/ ) which is designed and used,
among others, for language prototyping (DSLs) and reverse engineering
(ancient languages, COBOL, PLI, etc, with strange grammars). The
advantage of our system is that we have a modular language definition
formalism (SDF) and this is possible because of the fact that GLR does
not impose any restrictions on the context-free grammars. So, it is
possible to perform the union context-free grammars, which is not the
case when you are using an LALR parser. The performance of our GLR
implementation is really good, we use it both in an interactive
environment and on the command line, e.g., to parse huge COBOL
programs. Our implementation of GLR is also used in other systems,
e.g. ELAN (http://www.loria.fr/equipes/protheo/SOFTWARES/ELAN/),
because of the grammatical complexity of the ELAN language.

>
> >
> > Should I simply restrict the grammars to those able to be parsed by
> > LALR(k) and just follow the crowd? Or is there some other technique
> > which I haven't ran into yet which may be better than either?

No, it is not really necessary to restrict your grammar, at least that
is our experience.

Some references:

@Book{Tom85,
    key = {Tom85},
    author = {Tomita, Masura},
    title = {{E}fficient {P}arsing for {N}atural {L}anguages. A Fast Algorithm
    for Practical Systems},
    year = {1985},
    publisher = {Kluwer Academic Publishers}
}

@PhdThesis{Rek92,
    key = {Rek92},
    author = {J. Rekers},
    title = {Parser Generation for Interactive Environments},
    year = {1992},
    school = {University of Amsterdam},
    note = {ftp://ftp.cwi.nl/pub/gipe/reports/Rek92.ps.Z},
    URL = {ftp://ftp.cwi.nl/pub/gipe/reports/Rek92.ps.Z}
}

@InProceedings{AH99,
    author = {Aycock, John and Horspool, Nigel},
    title = {Faster Generalized {LR} Parsing},
    booktitle = {Compiler Construction (CC'99)},
    pages = {32--46},
    year = 1999,
    editor = {S. J\"ahnichen},
    volume = 1575,
    series = {Lecture Notes in Computer Science},
    address = {Amsterdam, The Netherlands},
    month = {March},
    publisher = {Springer-Verlag}
}

@InProceedings{Lan74,
    key = {Lan74},
    author = {B. Lang},
    title = {{D}eterministic techniques for efficient non-deterministic parsers},

    booktitle = {Proceedings of the Second Colloquium on Automata, Languages and
    Programming},
    editor = {J. Loeckx},
    year = {1974},
    series = {Lecture Notes in Computer Science},
    volume = {14},
    pages = {255-269},
    organization = {Springer-Verlag}
}

@TechReport{SJH00,
    key = {SJH00},
    author = {E. Scott and A. Johnstone and S.S. Hussain},
    titel = {Tomita-{S}tyle {G}eneralised {LR} {P}arsers},
    number = {TR-00-12},
    year = {2000},
    institution = {Royal Holloway, University of London, Computer Science
Department}
}

@phdthesis{Vis97,
    key = {Vis97},
    author = {E. Visser},
    title = {Syntax Definition for Language Prototyping},
    school = {University of Amsterdam},
    year = {1997},
    note = {http://www.cs.uu.nl/~visser/publications/index.html}
}

@inproceedings = {BV01,
        author = "M.G.J. van den Brand and C. Ringeissen",
        title = "ASF+SDF parsing tools applied to ELAN",
        booktitle = "Electronic Notes in Theoretical Computer Science",
        volume = "36",
        publisher = "Elsevier Science Publishers",
        editor = "Kokichi Futatsugi",
        year = "2001"
}

----------------------------------------------------------------
M.G.J. van den Brand,
Department of Software Engineering
CWI
Kruislaan 413, NL-1098 SJ AMSTERDAM, The Netherlands.

Tel___(+31) 20 5924213 WWW____http://www.cwi.nl/~markvdb/
Fax___(+31) 20 5924199 Email__Mark.van.den.Brand@cwi.nl
----------------------------------------------------------------

Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.

Re: The speed of an Earley parser?

"M.G.J. van den Brand" <Mark.van.den.Brand@cwi.nl>8 Aug 2001 01:14:28 -0400

"M.G.J. van den Brand" <Mark.van.den.Brand@cwi.nl>
8 Aug 2001 01:14:28 -0400