From: | "Joachim Durchholz" <joachim_d@gmx.de> |
Newsgroups: | comp.compilers |
Date: | 22 Sep 2002 12:15:53 -0400 |
Organization: | Compilers Central |
References: | 02-09-087 02-09-110 02-09-121 |
Keywords: | lex, design |
Posted-Date: | 22 Sep 2002 12:15:53 EDT |
Clint Olsen wrote:
> Joachim Durchholz wrote:
>
>>Contrary to common wisdom, I believe that the lexer should
>>not really [do checking]
>
> But you do agree that it's the lexer's job to check comments,
> right?
That depends.
If your software should simply discard any comments, it's easiest to
discard them in the lexer.
If comments are just blobs of text, then you're still best off doing
comment recognition by hand.
Things begin to change once you want to do more with the stuff within
comments. For example, documentation extraction tools expect language
entities in some comments; most languages require string detection
within comments; you may be writing a language processing toolkit, and
you want to lex the comment contents anyway, because you foresee that
some editor will want to do syntax highlighting even within comments.
For a lexer in full generality, it can be advantageous to split the
lexer into several levels:
- Reader. Character set conversion, line ending conventions.
Also keeps track of line and column numbers.
- Tokenizer. Groups characters into tokens.
In particular, does string recognition (for nestable comments,
these must be known to avoid mis-lexing stuff like /* "*/" */).
- Comment recognition.
> How do you expect to be able to represent comments in a CFG?
comment ::= "/*" {token} "*/"
token ::= comment
| string | integer | ... (literals)
| "if" | "then" | ... (keywords)
| <error>
(The <error> token is meant to be whatever the lexer returns if a
character sequence is not a legal token.)
Regards,
Joachim
Return to the
comp.compilers page.
Search the
comp.compilers archives again.