Re: Speeding up LEX scanning times

vern@daffy.ee.lbl.gov (Vern Paxson)
Sat, 4 Feb 1995 05:25:25 GMT

          From comp.compilers

Related articles
Speeding up LEX scanning times pahint@eunet.be (1995-02-02)
Re: Speeding up LEX scanning times dimock@das.harvard.edu (1995-02-02)
Re: Speeding up LEX scanning times c1veeru@WATSON.IBM.COM (Virendra K. Mehta) (1995-02-02)
Re: Speeding up LEX scanning times monnier@di.epfl.ch (Stefan Monnier) (1995-02-03)
Re: Speeding up LEX scanning times mercier@hollywood.cinenet.net (1995-02-03)
Re: Speeding up LEX scanning times vern@daffy.ee.lbl.gov (1995-02-04)
Re: Speeding up LEX scanning times eifrig@beanworld.cs.jhu.edu (1995-02-07)
| List of all articles for this month |

Newsgroups: comp.compilers
From: vern@daffy.ee.lbl.gov (Vern Paxson)
Keywords: Cobol, parse, lex
Organization: Lawrence Berkeley Laboratory, Berkeley CA
References: 95-02-041
Date: Sat, 4 Feb 1995 05:25:25 GMT

Virendra K. Mehta <c1veeru@WATSON.IBM.COM> wrote:
> COBOL lexical analyzers can be pretty slow because of the very complicated
> regular expressions involved.... These generate a sizable dfa and can slow
> the scanner down a lot.


This is a common misconception (because it's so intuitive) regarding
regular expression matching when using DFA's. A key point when using DFA's
is that the run time is *independent* of the complexity of the RE. It's
instead just linear in the number of characters scanned, since the DFA
makes exactly one state transition per character, and figuring out which
transition to make takes constant time.


A couple of fine points: the more complex the RE, the more complex the
DFA (generally), so for really hairy RE's scanning will be a bit slower
due to paging in the DFA tables, cache misses, and the like. Also, it's
possible to have lex rules that require the scanner to scan further than
the end of the token, in order to ensure that the input doesn't match a
longer token. These sorts of rules result in scanning portions of the
input more than once, which slows down the scanner accordingly. They can
occur with both simple and complicated RE's. Flex has an option (-b) to
report when your rules have this property (they usually do) and provide
some guidance in modifying your rules to eliminate the need to back up.
Ironically, the way you eliminate the back up, and thus speed up your
scanner, is by adding more rules!


So basically do as much scanning as possible using RE's: with a proper
DFA implementation they're very fast.


Vern
--


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.