6 Oct 2006 17:15:42 -0400

Related articles |
---|

Regex to DFA benhanson2@icqmail.com (2006-09-25) |

Re: Regex to DFA int2k@gmx.net (Wolfram Fenske) (2006-09-30) |

Re: Regex to DFA benhanson2@icqmail.com (2006-10-06) |

From: | benhanson2@icqmail.com |

Newsgroups: | comp.compilers |

Date: | 6 Oct 2006 17:15:42 -0400 |

Organization: | Compilers Central |

References: | 06-09-13106-09-165 |

Keywords: | lex, DFA |

Wolfram Fenske wrote:

*> benhanson2@icqmail.com schreibt:*

*>*

*> > Hello,*

*> >*

*> > I am writing a lexer generator library in C++*

*> > (http://www.benhanson.net/lexertl.html). ...*

*> > I'm wondering what the fastest technique is for implementing the syntax*

*> > tree to DFA conversion as described in the Dragon Book. In previous*

*> > versions I have done it by creating*

*> >*

*> > std::map<std::set<node *>, size_t>*

*> > ...*

*>*

*> I can't really follow all of your posting, but if speed is your*

*> concern, you probably shouldn't use std::map, at least not in this*

*> way:*

*>*

*> 1. Maps are implemented as AVL trees, which means you have O(log(n))*

*> complexity, not O(1) as with hash maps. OTOH if the sets are*

*> small (less than, say, 10 to 20 elements), constant factors*

*> become more important, and AVL trees might be faster than hash*

*> maps after all. You'd have to do profiling to find out what*

*> works best.*

*>*

*> 2. If your map is defined like this*

*>*

*> std::map<std::set<node *>, size_t>*

*>*

*> the keys in the map will be copies of the original keys, which is*

*> probably expensive because your keys are sets. Using pointers*

*> would be faster but also makes things more complicated*

*> (e. g. memory management and comparison of map keys).*

*>*

*> > Is there a much simpler C language approach that I could use that*

*> > would be more efficient?*

*>*

*> If you really need maps and sets, I'd say no.*

*>*

*> > The source code to flex hasn't been particularly illuminating and I*

*> > found Vern Paxson's explanations on this group much more helpful,*

*> > but these were from the late eighties/early nineties!*

*>*

*> But remember that DFAs have been around for a long time, too.*

*>*

*> Wolfram*

I've settled for std::vector<std::set<node *> *> and a linear search.

By using *(*iter) == *curr_set (i.e. not just comparing pointers!) it

turns out this is fast enough due to the implementation of the set ==

operator (checks sizes for equality first).

Regards,

Ben

Post a followup to this message

Return to the
comp.compilers page.

Search the
comp.compilers archives again.