How can the speed of a scanner be independent of the number of rules?

Roger L Costello <costello@mitre.org>
Wed, 23 Mar 2022 18:49:39 +0000

          From comp.compilers

Related articles
How can the speed of a scanner be independent of the number of rules? costello@mitre.org (Roger L Costello) (2022-03-23)
| List of all articles for this month |

From: Roger L Costello <costello@mitre.org>
Newsgroups: comp.compilers
Date: Wed, 23 Mar 2022 18:49:39 +0000
Organization: Compilers Central
Injection-Info: gal.iecc.com; posting-host="news.iecc.com:2001:470:1f07:1126:0:676f:7373:6970"; logging-data="39843"; mail-complaints-to="abuse@iecc.com"
Keywords: lex, question, comment
Posted-Date: 23 Mar 2022 15:24:41 EDT
Content-Language: en-US

Hi Folks,


On page 48 of the Flex manual [1] it says this amazing thing:


Note that adding rules does not slow down the scanner! The speed of the
scanner is independent of the number of rules or (modulo the considerations
given at the beginning of this section) how complicated the rules are with
regard to operators such as '*' and '|'.


That is amazing! And counterintuitive. How can it possibly be that a scanner
containing 1000 rules can operate as fast as a scanner containing 10 rules?
Would you give some intuition to help me understand this, please?


/Roger


[1] https://epaperpress.com/lexandyacc/download/flex.pdf


[Flex compiles the rules into a finite state machine. When the scanner
runs, it just looks up each character it reads in the table for the current
state to decide what to do. Creating the state tables for 1000 rules takes
a lot longer than creating the tables for 10 rules, but that just happens
once when you build the scanner, not when it's running.
For more details on regular expressions and state machines, see any compiler
textbook. It's one of the standrd topics. -John]


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.