Re: Regex to DFA

"Wolfram Fenske" <int2k@gmx.net>
30 Sep 2006 17:43:39 -0400

          From comp.compilers

Related articles
Regex to DFA benhanson2@icqmail.com (2006-09-25)
Re: Regex to DFA int2k@gmx.net (Wolfram Fenske) (2006-09-30)
Re: Regex to DFA benhanson2@icqmail.com (2006-10-06)
| List of all articles for this month |

From: "Wolfram Fenske" <int2k@gmx.net>
Newsgroups: comp.compilers
Date: 30 Sep 2006 17:43:39 -0400
Organization: http://groups.google.com
References: 06-09-131
Keywords: lex, DFA
Posted-Date: 30 Sep 2006 17:43:39 EDT

benhanson2@icqmail.com schreibt:


> Hello,
>
> I am writing a lexer generator library in C++
> (http://www.benhanson.net/lexertl.html). I am half way through the new
> version which uses the equivalence class technique from flex to speed
> up DFA creation. So far I have generated the regex syntax tree,
> partitioned all the regex tokens and built the lookup vector to map
> input characters to character sets.
>
> I'm wondering what the fastest technique is for implementing the syntax
> tree to DFA conversion as described in the Dragon Book. In previous
> versions I have done it by creating
>
> std::map<std::set<node *>, size_t>
>
> This means that as a set of nodes representing the next DFA state is
> built I can lookup in the map fairly quickly to see if it has already
> been seen. However, the more containers I use, the more over-head
> there is when these temporary data structures are destroyed.


I can't really follow all of your posting, but if speed is your
concern, you probably shouldn't use std::map, at least not in this
way:


    1. Maps are implemented as AVL trees, which means you have O(log(n))
          complexity, not O(1) as with hash maps. OTOH if the sets are
          small (less than, say, 10 to 20 elements), constant factors
          become more important, and AVL trees might be faster than hash
          maps after all. You'd have to do profiling to find out what
          works best.


    2. If your map is defined like this


                  std::map<std::set<node *>, size_t>


          the keys in the map will be copies of the original keys, which is
          probably expensive because your keys are sets. Using pointers
          would be faster but also makes things more complicated
          (e. g. memory management and comparison of map keys).


> Is there a much simpler C language approach that I could use that
> would be more efficient?


If you really need maps and sets, I'd say no.


> The source code to flex hasn't been particularly illuminating and I
> found Vern Paxson's explanations on this group much more helpful,
> but these were from the late eighties/early nineties!


But remember that DFAs have been around for a long time, too.


Wolfram


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.