Related articles |
---|
Is there a better flex ? thuerman@ibr.cs.tu-bs.de (1991-07-08) |
Re: Is there a better flex ? vern@horse.ee.lbl.gov (1991-07-26) |
Newsgroups: | comp.compilers |
From: | vern@horse.ee.lbl.gov (Vern Paxson) |
Keywords: | flex |
Organization: | Lawrence Berkeley Laboratory, Berkeley |
References: | 91-07-030 |
Date: | Fri, 26 Jul 91 03:34:41 GMT |
In article 91-07-030 thuerman@ibr.cs.tu-bs.de (Urs Thuermann) writes:
> ... I found out that flex, after converting the nfa to a dfa,
> doesn't reduce the dfa, i.e. the dfa which flex outputs has equivalent
> states which could be replaced by one state. For example, given the
> regular expression for a C comment
>
> "/*"\/*([^*/]\/*|\*)*"*/"
>
> flex generates an automaton with 8 states while the reduced automaton
> would have only 5 states.
The fundamental emphasis behind the flex design is it be as efficient as
possible for large, "real-world" scanners such as those for compilers. In
general this means that I didn't worry about wasting a few extra bytes in
table or code space, but I did worry about wasting any extra cycles
(particularly in the scanner run-time).
Flex does not do state minimization because it is my suspicion that with
large scanners the reduction in scanner size will be quite small. State
minimization also does not improve run-time performance (other than due to
better locality of the state tables) and will increase the scanner
generation time (unless so many states are eliminated that the time spent
compiling the result more than balances the time spent computing the
minimization).
> .... Another optimization I would like flex to do is generating ONE
> accepting state for several rules with the same action ...
This is also usually only a win for small scanners. The present mechanism
of generating separate accepting numbers for each of the rules joined with
a '|' action was adopted because it was easiest to implement - one just
omits a "break" for any rule whose action is '|'. Again, it would be
surprising if this behavior makes a difference for a large scanner, since
at most it adds a few extra cases in the main switch statement and a few
extra entries in the table mapping final state to accepting number.
> .... Maybe I also found a bug in flex: If I write
>
> [!%&()*+,-./:;<>?[\]^{}|~] return (yytext[0]);
>
> one equivalence class for these 23 characters will be built, but if I
> write
>
> !|%|&|\(|\)|\*|\+|,|-|\.|\/|:|;|\<|\>|\?|\[|\]|^|\{|\||\}|~
> return (yytext[0]);
>
>there is an equivalence class for each character.
I'd long wondered if anyone would ever complain about that! Here again
the reason for flex's behavior is ease of implementation. It's quite easy
to update a set of equivalence classes given a character class, but it's
much hairier to do it for a general NFA, which is what the parser is going
to turn the second rule into. The assumption I made was that in general
the user would use character classes when possible since they're convenient;
and when they don't, the characters that might be equivalent in one context
(such as the above rule) will prove to be inequivalent in some other context.
Also, if one is using the default table compression (or the -Cm option),
the above characters will still fall into the same "meta-equivalence" class
and their transitions will be compressed in the generated scanner anyway.
> Does anybody know of a version of flex which performs the optimizations
> mentioned above and if so, where can I get it.
No, but you might want to upgrade to release 2.3.7, which is substantially
newer than the 1987 version you're using. You can get it via anonymous ftp
to ftp.ee.lbl.gov (128.3.112.20; retrieve flex-2.3.7.tar.Z, using binary
mode), or most likely from your nearby GNU archives.
If you don't have anonymous ftp access, let me know and I'll mail you the
uuencoded tar file.
Vern
Vern Paxson vern@ee.lbl.gov
Computer Systems Engineering ucbvax!ee.lbl.gov!vern
Lawrence Berkeley Laboratory (415) 486-7504
--
Return to the
comp.compilers page.
Search the
comp.compilers archives again.