Problems lexing out tokens

"JMB" <dos_programmer@yahoo.com>
13 Oct 2002 16:07:06 -0400

          From comp.compilers

Related articles
Problems lexing out tokens dos_programmer@yahoo.com (JMB) (2002-10-13)
Re: Problems lexing out tokens vbdis@aol.com (VBDis) (2002-10-18)
| List of all articles for this month |
From: "JMB" <dos_programmer@yahoo.com>
Newsgroups: comp.compilers
Date: 13 Oct 2002 16:07:06 -0400
Organization: http://groups.google.com/
Keywords: lex, comment
Posted-Date: 13 Oct 2002 16:07:06 EDT

This question is sort of related to compilers. I want to write a
Pascal-to-HTML source converter (for fun). The way I have it set up is
with one main module, and two slave modules; one is the Scanner, and
the other is HTMLOutput. On each call to Scanner, scanner should
return a token, and since we want comments and whitespace, it doesn't
ignore them. I've decided that the Scanner shouldn't know that it's in
the middle of a comment, that is the job of Main. That took care of
comments spanning multiple lines. Now the problem is:


First, lines *can* be longer than 256 characters, but I can't extract
more than 255 characters at one time, so if it just so happens that
I'm tokenizing a really long string and have to stop where there is a
digit, the next pass to Scanner will try to scan for a number rather
than a string. Secondly, string literals, which I consider the job of
the Scanner to recognize the entire string, can span multiple lines,
without violating the language rules. Not that the program cares about
language rules.


Anyway, what is possibly an elegant solution to this? I thought about
having a boolean, isComplete, to let Main know if the Scanner was done
getting the remainder of the token, and to allow Scanner to pick up
where it left off. That's fine -- except for when I encounter
whitespace again. The type of the token on the last call (string) will
be overwritten (as whitespace), and the Scanner will pick up thinking
it's supposed to continue with whitespace rather than a string
literal.


Thank you for any insight/hints/suggestions...
[I'd just put code in your lexer to work around the archaic 255 string
limitation in whatever language you're using. -John]


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.