Related articles |
---|
LEX behaviour when given "large" automata. phs@lifia.imag.fr (1988-03-03) |
Re: LEX behaviour when given "large" automata. rsalz@BBN.COM (Rich Salz) (1988-03-18) |
Re: LEX behaviour when given "large" automata. chris@mimsy.UUCP (1988-03-20) |
Re: LEX behaviour when given "large" automata. lbl-helios!vern@lbl-rtsg.arpa (Vern Paxson) (1988-03-18) |
Re: LEX behaviour when given "large" automata. sargas.usc.edu!tli@oberon.usc.edu (1988-03-20) |
From: | chris@mimsy.UUCP (Chris Torek) |
Newsgroups: | comp.compilers,comp.lang.c,comp.unix.questions |
Date: | 20 Mar 88 05:30:17 GMT |
References: | <911@ima.ISC.COM> <914@ima.ISC.COM> |
Organization: | U of Maryland, Dept. of Computer Science, Coll. Pk., MD 20742 |
Several people suggested replacing lex description such as
if return (IF);
then return (THEN);
else return (ELSE);
{id} { symenter(yytext); return (ID); }
with
{id} { /* first try it as a keyword */
k = keyword_index(yytext); if (k >= 0) return (k);
/* not a keyword, must be an identifier */
symenter(yytext); return (ID); }
This may (probably does) help in lex, but it is clearly the wrong way
around in almost all cases. The lexer has to traverse a DFA to decide
that something is an ID. While it is at it it could easily tell
whether it is a keyword instead. Obviously this makes the DFA table
larger, but the result should run faster.
It turns out (as Van Jacobson discovered) that lex uses a long
subroutine---on the order of 200 lines---to do what is essentially
described by the loop
/* follow the DFA */
for (state = begin; state[c].v == c; c = *in++)
state += state[c].n;
(best case) or
/*
* Pick the right start state depending on whether we are at
* the beginning of a line.
*/
state = newline ? hat_begin : begin;
while (state[c].v == c) {
/* follow the DFA */
state += state[c].n;
c = *in++;
/* remember action state (for right context) */
if (state->v) {
a_s = state;
a_c = in;
}
}
in = a_c;
if (a_s->n) {
in -= a_s->n; /* back up over right context */
c = in[-1];
}
newline = in[-2]=='\n'; /* remember whether we are now at ^ */
in[-1] = '\0'; /* kill residual */
(worst case). This pretty much handles everything except YYREJECT.
I do not know whether the LBL `flex' (Fast LEX) handles YYREJECT now.
--
In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7163)
Domain: chris@mimsy.umd.edu Path: uunet!mimsy!chris
--
Return to the
comp.compilers page.
Search the
comp.compilers archives again.