Related articles |
---|
Regex to DFA benhanson2@icqmail.com (2006-09-25) |
Re: Regex to DFA int2k@gmx.net (Wolfram Fenske) (2006-09-30) |
Re: Regex to DFA benhanson2@icqmail.com (2006-10-06) |
From: | "Wolfram Fenske" <int2k@gmx.net> |
Newsgroups: | comp.compilers |
Date: | 30 Sep 2006 17:43:39 -0400 |
Organization: | http://groups.google.com |
References: | 06-09-131 |
Keywords: | lex, DFA |
Posted-Date: | 30 Sep 2006 17:43:39 EDT |
benhanson2@icqmail.com schreibt:
> Hello,
>
> I am writing a lexer generator library in C++
> (http://www.benhanson.net/lexertl.html). I am half way through the new
> version which uses the equivalence class technique from flex to speed
> up DFA creation. So far I have generated the regex syntax tree,
> partitioned all the regex tokens and built the lookup vector to map
> input characters to character sets.
>
> I'm wondering what the fastest technique is for implementing the syntax
> tree to DFA conversion as described in the Dragon Book. In previous
> versions I have done it by creating
>
> std::map<std::set<node *>, size_t>
>
> This means that as a set of nodes representing the next DFA state is
> built I can lookup in the map fairly quickly to see if it has already
> been seen. However, the more containers I use, the more over-head
> there is when these temporary data structures are destroyed.
I can't really follow all of your posting, but if speed is your
concern, you probably shouldn't use std::map, at least not in this
way:
1. Maps are implemented as AVL trees, which means you have O(log(n))
complexity, not O(1) as with hash maps. OTOH if the sets are
small (less than, say, 10 to 20 elements), constant factors
become more important, and AVL trees might be faster than hash
maps after all. You'd have to do profiling to find out what
works best.
2. If your map is defined like this
std::map<std::set<node *>, size_t>
the keys in the map will be copies of the original keys, which is
probably expensive because your keys are sets. Using pointers
would be faster but also makes things more complicated
(e. g. memory management and comparison of map keys).
> Is there a much simpler C language approach that I could use that
> would be more efficient?
If you really need maps and sets, I'd say no.
> The source code to flex hasn't been particularly illuminating and I
> found Vern Paxson's explanations on this group much more helpful,
> but these were from the late eighties/early nineties!
But remember that DFAs have been around for a long time, too.
Wolfram
Return to the
comp.compilers page.
Search the
comp.compilers archives again.