|Lexical scanners? firstname.lastname@example.org (1998-03-30)|
|Re: Lexical scanners? email@example.com (Christian Wetzel) (1998-04-03)|
|Re: Lexical scanners? firstname.lastname@example.org (1998-04-09)|
|Date:||30 Mar 1998 21:33:29 -0500|
|Keywords:||lex, i18n, question|
I need a lexical scanner which...
a) uses a large symbol set (Unicode), and
b) can have new tokeens added dynamically.
Ignoring requirement (b), for the moment, are there any tools
available which can construct (with reasonable space and time
efficiency) a lexical scanner for large character sets?
Are there any tools available which satisfy both requirements?
Failing that, does anyone have any suggestions for construction?
What is a reasonable level of performance for such a scanner
(both with and without requirement (b))?
I have put together a tentative C++ OO framework for such a lexical
scanner, but don't really want write everything if it can be avoided
(not to mention, that C++ OO might not be the best tool for the job).
Tentative performance, where each source line is treated as token, is
approximately 10.5 seconds to read and scan a 50,000,000 character
file (but with the file cached so there is no disk i/o overhead).
This is on a 200Mhz PPro with 640M of ram. Is this really good,
really bad or average performance for this type of problem?
Feel free to e-mail me directly if you would rather not post to
the newsgroup, or if you want a conversation.
Michael Lee Finney
[In previous discussions, the best suggestion I've seen for large character
sets is to observe that although the number of characters is large, the
number of syntactically different character types is a lot smaller and so
to do an ad-hoc map of characters into type codes and run the lex on the
type codes. -John]
Return to the
Search the comp.compilers archives again.