Re: Tokenizer

"Robert Monroe" <robert.f.monroe@verizon.net>
18 Oct 2002 23:43:37 -0400

          From comp.compilers

Related articles
Tokenizer harvinderrikhi@yahoo.co.uk (Harvinder Singh) (2002-10-13)
Re: Tokenizer haberg@matematik.su.se (Hans Aberg) (2002-10-18)
Re: Tokenizer robert.f.monroe@verizon.net (Robert Monroe) (2002-10-18)
| List of all articles for this month |

From: "Robert Monroe" <robert.f.monroe@verizon.net>
Newsgroups: comp.compilers
Date: 18 Oct 2002 23:43:37 -0400
Organization: http://groups.google.com/
References: 02-10-026
Keywords: lex
Posted-Date: 18 Oct 2002 23:43:36 EDT

"Harvinder Singh" <harvinderrikhi@yahoo.co.uk> wrote
> Should tokenizer decide the meaning of the tokens in a given context
> or the parser is responsible for it.
> ...
> < and > are token for me
>
> 1. Now if my tokens comes in quotes i.e in a string, should tokenizer
> still give me < and > as tokens or should it returns me string token
>
> 2. Lets take an example of xml
>
> <T> >
> </T>
>
> The second > is to be treated as a string, should tokenizer give
> me this info or the parser should tell me this.
>
> In nutshell what is the job of a tokenizer and the parser.


What you are describing in item #1 and the example in item #2 are two
different things. In the case of item #1, I agree with the moderator,
I would turn the quoted string into a single token. Item #2 is not an
example of a quoted string, it is character data. In XML it would
usually be handled by substituting a 'character entity' (e.g. &gt;)
for the literal character on the input. In that case the lexer would
usually return the character as a single token.


I hope that helps,
      Bob.


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.