Re: Lex surrogates

gmdka! (Josef Grosch)
Fri, 17 Feb 89 15:24:15 -0100

          From comp.compilers

Related articles
[6 earlier articles]
Re: Lex surrogates (1989-02-09)
Re: Lex surrogates tower@bu-cs.BU.EDU (1989-02-10)
Re: Lex surrogates (1989-02-11)
Re: Lex surrogates (Henry Spencer) (1989-02-11)
Re: Lex surrogates (Ric Holt) (1989-02-13)
Re: Lex surrogates (1989-02-16)
Re: Lex surrogates gmdka! (1989-02-17)
| List of all articles for this month |

Date: Fri, 17 Feb 89 15:24:15 -0100
From: gmdka! (Josef Grosch)

1. Van Jacobson's loop

In <> Rich Salz <rsalz@BBN.COM> writes:
> Van Jacobson (also of LBL) spent a few some time tuning lex ...
> he was able to get the innermost loop to compile into a single 68000
> instructions. ...
> lex was just a shade slower than cat(1). Van's changes were never
> made available, ...

I would really like to see this one instruction loop!
Otherwise it sounds like an unbelievable rumour, because the fastest
hand-coded loop for scanning that I can imagine has 4 instructions.
For a general automaton I as well as others need at least 5 instructions.

2. Rex: a fast scanner generator

Rex is like flex an improved remake of Lex. Additional features are the
automatic calculation of the line and column position of every token and a
mechanism to handle include files. REJECT has been sacrificed.
Rex generated scanners are always 20-30% faster than the fastest flex
version. The tables are always as small as possible. Depending on the
problem the table size can be only 5-77% and the generation time only
50-70% of the flex figures. For more info see below. Some figures:

                                                  Lex Flex Rex
speed [lines/min.]
      without hashing 36,400 139,000 182,700
      with hashing 34,700 118,000 141,400
table size [bytes] 39,200 57,300 4,400
scanner size [bytes] 43,800 64,100 11,200
generation time [sec.] 73.7 7.2 4.9

3. Lalr + Ell: fast parser generators

Not only Lex is slow, Yacc also. We have an Yacc remake called Lalr
that generates parsers that are 2 to 3 times faster. Moreover automatic
error handling is provided which includes error reporting, recovery, and
repair. Ell generated parsers beat Yacc by a factor of 4. Ell processes
LL(1) grammars and generates conventional recursive descent parsers.
For more info see below. Some figures:

                                        Bison Yacc PGS Lalr Ell
speed [lines/min.] 105,000 184,000 200,000 385,000 700,000
table size [bytes] 8,004 10,364 11,268 11,795 -
parser size [bytes] 11,136 12,548 17,616 17,416 14,344
gen. time [sec.] 5.0 19.6 69.5 29.6 6.4

4. I'm wondering that nobody is talking about ALEX:

%A H. M\\*:ossenb\\*:ock
%T Alex - A Simple and Efficient Scanner Generator
%V 21
%N 12
%P 139-148
%D DEC 1986

5. Announcement

            Rex, Lalr and Ell - Compiler Construction Tools

          Rex (Regular EXpression tool) is a scanner generator whose
specifications are based on regular expressions and arbitrary
semantic actions written in one of the target languages C or
Modula-2. As scanners sometimes have to consider the context to
unambiguously recognize a token the right context can be speci-
fied by an additional regular expression and the left context can
be handled by so-called start states. The generated scanners
automatically compute the line and column position of the tokens
and offer an efficient mechanism to normalize identifiers and
keywords to upper or lower case letters. The scanners are table-
driven and run at a speed of 180,000 to 195,000 lines per minute
on a MC 68020 processor.

          Lalr is a LALR(1) parser generator accepting grammars writ-
ten in extended BNF notation which may be augmented by semantic
actions expressed by statements of the target language. The gen-
erator provides a mechanism for S-attribution, that is syn-
thesized attributes can be computed during parsing. In case of
LR-conflicts unlike other tools Lalr provides not only informa-
tion about an internal state consisting of a set of items but it
prints a derivation tree which is much more useful to analyze the
problem. Conflicts can be resolved by specifying precedence and
associativity of operators and productions. The generated parsers
include automatic error recovery, error messages, and error
repair. The parsers are table-driven and run at a speed of
400,000 lines per minute. Currently parsers can be generated in
the target languages C and Modula-2.

          Ell is a LL(1) parser generator accepting the same specifi-
cation language as Lalr except that the grammars must obey the
LL(1) property. It is possible to evaluate an L-attribution
during parsing. The generated parsers include automatic error
recovery, error messages, and error repair like Lalr. The
parsers are implemented following the recursive descent method
and reach a speed of 700,000 lines per minute. The possible tar-
get languages are again C and Modula-2.

          All the tools are implemented in Modula-2.
A comparison of the above tools with the corresponding UNIX
tools shows that significant improvements have been achieved:
Rex generated scanners are 4 times faster than those of LEX.
Lalr generated parsers are 2-3 times faster than those of YACC.
Ell generated parsers are 4 times faster than those of YACC.

The tools are public domain. They have been developed in 1987.
They have been tested by generating scanners and parsers for
e. g. Pascal, Modula, Oberon, Ada and found stable.

The tools have been developed using our own Modula-2 compiler called
MOCKA on a MC 68020 based UNIX workstation. They have been ported
to the SUN workstation and been compiled successfully using the SUN
Modula-2 compiler. They also run on VAX/BSD UNIX and VAX/ULTRIX machines.
This should assure a reasonable level of portability.

I would be pleased to send the programs to everybody that is interested.
To minimize the costs (at least for me) I suggest to send a blank SUN
streamer tape. 1/2" magtapes are possible, too, but cause more work for me.
I will return the tape with the sources organized as a UNIX file tree.
(tar format)
It would be nice if you could include a cheque of US $ 10 or equivalent
to cover return postage.

The tape will contain user manuals for each tool.
Papers about Rex and Lalr will be published soon.

I have submitted the sources to the usenet newsgroup comp.sources.unix.

Josef Grosch
GMD - Forschungsstelle an der Universitaet Karlsruhe
Haid-und-Neu-Str. 7
D-7500 Karlsruhe 1
West Germany

Phone: +721-662226
Email: grosch@gmdka.UUCP

Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.