Related articles |
---|
Learning only one lexer made me blind to its hidden assumptions costello@mitre.org (Roger L Costello) (2022-07-07) |
Re: Learning only one lexer made me blind to its hidden assumptions luser.droog@gmail.com (luser droog) (2022-07-12) |
Re: Learning only one lexer made me blind to its hidden assumptions jvilar@uji.es (Juan Miguel Vilar Torres) (2022-07-13) |
Re: Learning only one lexer made me blind to its hidden assumptions drikosev@gmail.com (Ev. Drikos) (2022-07-13) |
Re: Learning only one lexer made me blind to its hidden assumptions antispam@math.uni.wroc.pl (2022-07-13) |
Re: Learning only one lexer made me blind to its hidden assumptions gneuner2@comcast.net (George Neuner) (2022-07-14) |
Re: Learning only one lexer made me blind to its hidden assumptions 480-992-1380@kylheku.com (Kaz Kylheku) (2022-07-15) |
[1 later articles] |
From: | Roger L Costello <costello@mitre.org> |
Newsgroups: | comp.compilers |
Date: | Thu, 7 Jul 2022 17:49:44 +0000 |
Organization: | Compilers Central |
Injection-Info: | gal.iecc.com; posting-host="news.iecc.com:2001:470:1f07:1126:0:676f:7373:6970"; logging-data="18181"; mail-complaints-to="abuse@iecc.com" |
Keywords: | lex, question, comment |
Posted-Date: | 11 Jul 2022 20:26:04 EDT |
Content-Language: | en-US |
Hi Folks,
For months I have been immersed in learning and using Flex. Great fun indeed.
But recently I have been reading a book, Crafting a Compiler with C, and
reading its chapter on lexers. The chapter describes two lexer-generators:
ScanGen and Lex. Oh my! Learning ScanGen opened my eyes to the hidden
assumptions in Lex/Flex. Without learning ScanGen I would have continued to
think that the way things are done in Lex/Flex way is the only way.
Below I have documented some of the differences between Lex/Flex and ScanGen.
Difference:
- Flex allows overlapping regexes. It is up to Flex to use the 'correct'
regex. Flex has rules for picking the correct one: longest match wins, regex
listed first wins.
- ScanGen does not allow overlapping regexes. Instead, you create one regex
and then, if needed, you create "Except" clauses. E.g., the token is an
Identifier, except if the token is 'Begin' or 'End' or 'Read' or 'Write'
Difference:
- Flex regexes use juxtaposition for specifying concatenation.
- ScanGen uses '.' to specify concatenation. And oh by the way, ScanGen calls
it 'catenation' not 'concatenation'
Difference:
- Flex regexes use | for specifying alteration in regexes
- ScanGen uses ',' to specify alternation
Difference:
- With Flex, tossing out characters (e.g., toss out the quotes surrounding a
string) may involve writing C code to reprocess the token
- ScanGen has a 'Toss' command to toss out a character, e.g, Quote(Toss). No
token reprocessing needed
Difference:
Flex regexes use ^ for specifying 'not', e.g., [^ab] means any char except a
and b
ScanGen regexes uses 'Not', e.g., Not(Quote)
Difference:
- Flex deals with individual characters
- ScanGen lumps characters into character classes and deals with classes. Use
of character classes decreases (quite significantly) the size of the
transition table
Difference:
- Flex regexes use the ? meta-symbol
- ScanGen doesn't have that. Instead, it has 'Epsilon'
Difference:
- ScanGen has something called a Major number and a Minor number for each
token
- Flex doesn't have that concept
[For the same reason, I don't think it's a good idea to learn only one programming langage. -John]
Return to the
comp.compilers page.
Search the
comp.compilers archives again.