Re: What should be check in Lexical Analyzer along with generating tokens?

"Lex Spoon" <lex@cc.gatech.edu>
25 Oct 2002 00:14:29 -0400

          From comp.compilers

Related articles
[5 earlier articles]
Re: What should be check in Lexical Analyzer along with generating clint@0lsen.net (Clint Olsen) (2002-09-25)
Re: What should be check in Lexical Analyzer along with generati clint@0lsen.net (Clint Olsen) (2002-09-29)
Re: What should be check in Lexical Analyzer along with genera joachim_d@gmx.de (Joachim Durchholz) (2002-10-13)
Re: What should be check in Lexical Analyzer along with generating tok lex@cc.gatech.edu (Lex Spoon) (2002-10-18)
Re: What should be check in Lexical Analyzer along with generating t joachim_d@gmx.de (Joachim Durchholz) (2002-10-20)
Re: What should be check in Lexical Analyzer along with generating lars@bearnip.com (Lars Duening) (2002-10-25)
Re: What should be check in Lexical Analyzer along with generating lex@cc.gatech.edu (Lex Spoon) (2002-10-25)
| List of all articles for this month |

From: "Lex Spoon" <lex@cc.gatech.edu>
Newsgroups: comp.compilers
Date: 25 Oct 2002 00:14:29 -0400
Organization: College of Computing, Georgia Tech
References: 02-09-087 02-09-110 02-09-121 02-09-128 02-09-141 02-09-156 02-10-010 02-10-056 02-10-095
Keywords: lex
Posted-Date: 25 Oct 2002 00:14:29 EDT

"Joachim Durchholz" <joachim_d@gmx.de> writes:


> Lex Spoon wrote:
> >
> > Here's a harder example:
> >
> > /* parse "this */ 10 /* and this" */
> >
> >
> > It will scan as four tokens:
> >
> > /*
> > parse
> > "this */ 10 /* and this"
> > */
> >
> >
> > The parser will be in trouble now if it wants to pull out the "10".
>
> If it hurts, don't do that ;-)
>
> Seriously, IMHO the 10 should be considered part of the comment.
> Otherwise, inserting /* */ around a piece of code will not reliably
> comment it out.
> Besides, most humans (well, at least myself *g*) will parse the initial
> line as a single comment, why should the scanner use a different assumption?


True, but that's only because of the specific example. I wrote it
that way to exhibit the problem clearly. Here is a more useful
instance of the same problem:




      /* here are the special character codes


              1 '
              2 !
              3 @
              4 ^
              5 "
              6 *
      */


      x = getchar();








> Basically, it depends on the questions what's the contents of a comment:
> is it a sequence of characters, or a sequence of tokens? That's
> something that should be found in the language definition.


That's true. I thought, however, that the original post suggested
parsing it as tokens even if the definition says otherwise. The idea
was to use an error token to get around any problems, but examples
like the above show that using an error token is not enough.




Lex


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.