Re: Regex to DFA

benhanson2@icqmail.com
6 Oct 2006 17:15:42 -0400

          From comp.compilers

Related articles
Regex to DFA benhanson2@icqmail.com (2006-09-25)
Re: Regex to DFA int2k@gmx.net (Wolfram Fenske) (2006-09-30)
Re: Regex to DFA benhanson2@icqmail.com (2006-10-06)
| List of all articles for this month |

From: benhanson2@icqmail.com
Newsgroups: comp.compilers
Date: 6 Oct 2006 17:15:42 -0400
Organization: Compilers Central
References: 06-09-13106-09-165
Keywords: lex, DFA
Posted-Date: 06 Oct 2006 17:15:42 EDT

Wolfram Fenske wrote:
> benhanson2@icqmail.com schreibt:
>
> > Hello,
> >
> > I am writing a lexer generator library in C++
> > (http://www.benhanson.net/lexertl.html). ...


> > I'm wondering what the fastest technique is for implementing the syntax
> > tree to DFA conversion as described in the Dragon Book. In previous
> > versions I have done it by creating
> >
> > std::map<std::set<node *>, size_t>
> > ...
>
> I can't really follow all of your posting, but if speed is your
> concern, you probably shouldn't use std::map, at least not in this
> way:
>
> 1. Maps are implemented as AVL trees, which means you have O(log(n))
> complexity, not O(1) as with hash maps. OTOH if the sets are
> small (less than, say, 10 to 20 elements), constant factors
> become more important, and AVL trees might be faster than hash
> maps after all. You'd have to do profiling to find out what
> works best.
>
> 2. If your map is defined like this
>
> std::map<std::set<node *>, size_t>
>
> the keys in the map will be copies of the original keys, which is
> probably expensive because your keys are sets. Using pointers
> would be faster but also makes things more complicated
> (e. g. memory management and comparison of map keys).
>
> > Is there a much simpler C language approach that I could use that
> > would be more efficient?
>
> If you really need maps and sets, I'd say no.
>
> > The source code to flex hasn't been particularly illuminating and I
> > found Vern Paxson's explanations on this group much more helpful,
> > but these were from the late eighties/early nineties!
>
> But remember that DFAs have been around for a long time, too.
>
> Wolfram


I've settled for std::vector<std::set<node *> *> and a linear search.
By using *(*iter) == *curr_set (i.e. not just comparing pointers!) it
turns out this is fast enough due to the implementation of the set ==
operator (checks sizes for equality first).


Regards,


Ben



Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.