Related articles |
---|
Speeding up LEX scanning times pahint@eunet.be (1995-02-02) |
Re: Speeding up LEX scanning times dimock@das.harvard.edu (1995-02-02) |
Re: Speeding up LEX scanning times c1veeru@WATSON.IBM.COM (Virendra K. Mehta) (1995-02-02) |
Re: Speeding up LEX scanning times monnier@di.epfl.ch (Stefan Monnier) (1995-02-03) |
Re: Speeding up LEX scanning times mercier@hollywood.cinenet.net (1995-02-03) |
Re: Speeding up LEX scanning times vern@daffy.ee.lbl.gov (1995-02-04) |
Re: Speeding up LEX scanning times eifrig@beanworld.cs.jhu.edu (1995-02-07) |
Newsgroups: | comp.compilers |
From: | eifrig@beanworld.cs.jhu.edu (Jonathan Eifrig) |
Keywords: | parse, lex |
Organization: | The Johns Hopkins University CS Department |
References: | 95-02-041 95-02-057 |
Date: | Tue, 7 Feb 1995 16:49:19 GMT |
Virendra K. Mehta <c1veeru@WATSON.IBM.COM> wrote:
> COBOL lexical analyzers can be pretty slow because of the very complicated
> regular expressions involved.... These generate a sizable dfa and can slow
> the scanner down a lot.
Vern Paxson <vern@daffy.ee.lbl.gov> wrote:
>This is a common misconception (because it's so intuitive) regarding
>regular expression matching when using DFA's. A key point when using DFA's
>is that the run time is *independent* of the complexity of the RE. It's
>instead just linear in the number of characters scanned, since the DFA
>makes exactly one state transition per character, and figuring out which
>transition to make takes constant time.
Well, this is a good example of Moving Theory Into Practice
Axiom Number 1: "Not all constants are created equal."
The problem is that traversing a given arc in a DFA a *second* time
can be significantly faster than the first time, due to caching effects.
Reads from the cache are typically 2-3 times faster than reads from the
main memory. Typically, DFA acceptors are coded up as:
state = START_STATE;
while (!final_states[state]) {
token = input();
state = next_state[state][token];
}
(In practice, the "next_state" array is usually distributed as the jump
tables resulting from a bunch of "switch" statements, one for each state,
but the effect is the same.)
Reading from the "next_state" array is a significant part of the
time taken each pass through the loop. If the other parts of the loop
are fast, cache misses can have dramatic effects.
>So basically do as much scanning as possible using RE's: with a proper
>DFA implementation they're very fast.
But sometimes other techniques are faster. :-)
Jack Eifrig (eifrig@cs.jhu.edu) The Johns Hopkins University, C.S. Dept.
--
Return to the
comp.compilers page.
Search the
comp.compilers archives again.