Making Generated Scanners Faster.

" Stephen P. Butler" <>
30 Jun 1996 16:54:08 -0400

          From comp.compilers

Related articles
Making Generated Scanners Faster. (Stephen P. Butler) (1996-06-30)
Re: Making Generated Scanners Faster. (Norman Culver) (1996-07-09)
| List of all articles for this month |

From: " Stephen P. Butler" <>
Newsgroups: comp.compilers
Date: 30 Jun 1996 16:54:08 -0400
Organization: Compilers Central
Keywords: lex, performance

I've been following the recent discussion on hand coded versus machine
generated scanners closely with a with certain amount of professional

Part the research I'm doing for my PhD is aimed at improving the
performance of the scanner for the RDP compiler-compiler. In
particular we'd like to produce a version which has the following

                a) Programmable (unlike the scanner for RDP 1.X which can
                      only return a small but useful number of token types).

                b) Integrated into the existing RDP syntax so that
                      describing the lexical structure of your language is
                      just an extension of describing the syntactic
                      structure. This would avoid the need to learn the
                      syntax of two different tools as is currently the
                      case if you're using a combination such as flex and
                      bison etc.

                c) Can have semantic actions embedded into the lexical
                      analysis phase in exactly the same way as they can be
                      embedded into the syntax analysis phase.

                d) Faster than, or at the very least, close enough to the
                      speed of hand coded scanners so that the temptation to
                      write these by hand and subsequently have to debug and
                      maintain them at this level is removed.

                e) Maintain the current portability of RDP and it's
                      generated parsers and scanners such that they should
                      compile and work on any system that has sufficient
                      (hopefully still modest) resources and an ANSI/ISO C
                      conforming compiler and library.

As part of this work, I'm currently in the process of tracking down
as many scanner generators as I can find with the intention of doing
a comprehensive performance analysis on them. There are two
intentions in this work - i) To publish the information found to
allow further discussion and research/experimentation and ii) To
determine what makes certain of the generators faster or slower etc.
The ultimate aim of course is to use the results found to improve the
scanner for RDP.

Obviously, part of this exercise must try and determine the relative
performance of the machine generated scanners against carefully
written and optimised hand coded scanners - particularly since the
performance gains claimed between them and machine generated scanners
vary so much. Thus my motivation for posting to comp.compilers...
What I'd like to ask people are:

                a) What "stunts" are people pulling in their hand coded
                      scanners that they feel make them faster than an
                      equivalent machine generated scanner.

                b) Do you have a hand coded scanner for a language that
                      you feel particularly proud of and have carefully
                      optimised. Much of my work will be speeded up
                      considerably if people were willing to let me borrow
                      existing scanners for comparative testing since it'll
                      save me a lot of the time needed to write and optimise
                      my own and it's likely that these will be a better
                      representation of hand coded scanners.

                      Currently, I'm particularly interested in scanners for C,
                      Pascal and Oberon/Oberon 2. I'd also be interested in
                      scanners for other languages as well though, particularly
                      if you believe your optimisations are really good or the
                      lexical structure of the langauage may be a good test of
                      machine generated scanner's performance (Except perhaps

Thanks in advance for your help,
Stephen P. Butler. | Department of Computer Science.
( | Royal Holloway, University of London,
                                                    | Egham, Surrey TW20 0EX England.

Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.