Re: Please provide a learning path for mastering lexical analysis languages

Paul B Mann <parser.generator.guy@gmail.com>
Sun, 8 May 2022 22:27:55 -0700 (PDT)

          From comp.compilers

Related articles
| List of all articles for this month |

From: Paul B Mann <parser.generator.guy@gmail.com>
Newsgroups: comp.compilers
Date: Sun, 8 May 2022 22:27:55 -0700 (PDT)
Organization: Compilers Central
References: 22-05-010 22-05-023
Injection-Info: gal.iecc.com; posting-host="news.iecc.com:2001:470:1f07:1126:0:676f:7373:6970"; logging-data="66705"; mail-complaints-to="abuse@iecc.com"
Keywords: lex
Posted-Date: 13 May 2022 13:03:24 EDT
In-Reply-To: 22-05-023

/* Token Rules */


<eof> -> \z


<constant> -> literal
-> integer
-> decimal
-> real


<identifier> -> letter (letter|digit)*


integer -> digit+


real -> integer exp
-> decimal exp


decimal -> digit+ '.'
-> '.' digit+
-> digit+ '.' digit+


exp -> 'e' digit+
-> 'E' digit+
-> 'e' '-' digit+
-> 'E' '-' digit+
-> 'e' '+' digit+
-> 'E' '+' digit+


literal -> ''' lchar '''


lchar -> lany
-> '\' '\'
-> '\' '''
-> '\' '"'
-> '\' 'n'
-> '\' 't'
-> '\' 'a'
-> '\' 'b'
-> '\' 'f'
-> '\' 'r'
-> '\' 'v'
-> '\' '0'


<string> -> '"' schar* '"'


schar -> sany
-> '\' '\'
-> '\' '''
-> '\' '"'
-> '\' 'n'
-> '\' 't'
-> '\' 'a'
-> '\' 'b'
-> '\' 'f'
-> '\' 'r'
-> '\' 'v'
-> '\' '0'


{whitespace} -> whitechar+


{commentline} -> '/' '/' neol*


{commentblock} -> '/' '*' na* '*'+ (nans na* '*'+)* '/'


/* Character Sets */


any = 0..255 - \z
lany = any - ''' - '\' - \n
sany = any - '"' - '\' - \n


letter = 'a'..'z' | 'A'..'Z' | '_'
digit = '0'..'9'


whitechar = \t | \n | \r | \f | \v | ' '


na = any - '*' // not asterisk
nans = any - '*' - '/' // not asterisk not slash
neol = any - \n // not end of line


\t = 9 // tab
\n = 10 // newline
\v = 11 // vertical feed?
\f = 12 // form feed
\r = 13 // return
\z = 26 // end of file
\b = 32 // blank/space


/* End */


The above lexical rules define C-language symbols.
It's just a lexical grammar, not too hard to figure out.
This is input to the DFA lexer generator, which is provided
with the LRSTAR parser generator on SourceForge.net.


DFA creates lexers that run 80% faster than "flex" lexers
and are about the same size.


If you need more language power to define a lexer ...
that's what parser are for.


BTW, LRSTAR creates parsers in C++ than were running
140 times faster than those created by ANTLR, using the
C++ target, the last time I did a comparison, 2 years ago.


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.