Re: Context sensitive scanner ?

ok@cs.rmit.edu.au (Richard A. O'Keefe)
29 Nov 1997 00:23:43 -0500

          From comp.compilers

Related articles
[2 earlier articles]
Re: Context sensitive scanner ? pjj@cs.man.ac.uk (1997-11-23)
Re: Context sensitive scanner ? Mikael.Pettersson@sophia.inria.fr (Mikael Pettersson) (1997-11-23)
Re: Context sensitive scanner ? genew@vip.net (1997-11-23)
Re: Context sensitive scanner ? thetick@magelang.com (Scott Stanchfield) (1997-11-24)
Re: Context sensitive scanner ? cfc@world.std.com (Chris F Clark) (1997-11-28)
Re: Context sensitive scanner ? henry@zoo.toronto.edu (Henry Spencer) (1997-11-28)
Re: Context sensitive scanner ? ok@cs.rmit.edu.au (1997-11-29)
Re: Context sensitive scanner ? hat@se-46.wpa.wtb.tue.nl (Albert Theo Hofkamp) (1997-11-29)
Re: Context sensitive scanner ? thetick@magelang.com (Scott Stanchfield) (1997-11-30)
Re: Context sensitive scanner ? johnm@non.net (1997-11-30)
Re: Context sensitive scanner ? thetick@magelang.com (Scott Stanchfield) (1997-11-30)
Re: Context sensitive scanner ? clark@quarry.zk3.dec.com (Chris Clark USG) (1997-12-05)
Re: Context sensitive scanner ? mark@research.techforce.nl (Mark Thiehatten) (1997-12-07)
[1 later articles]
| List of all articles for this month |

From: ok@cs.rmit.edu.au (Richard A. O'Keefe)
Newsgroups: comp.compilers
Date: 29 Nov 1997 00:23:43 -0500
Organization: Comp Sci, RMIT University, Melbourne, Australia.
References: 97-11-117
Keywords: lex

Albert Theo Hofkamp <hat@se-46.wpa.wtb.tue.nl> writes:
>1) Literal reals (such as 1.2),
>2) Nested index operations on arrays (such as x.1.2).
>[Well, disregarding the question of whether it's a good idea to write
>languages with lexical puns that practically beg people to write code
>that the compiler will misinterpret, I'd have the lexer return
>integers and dots as separate tokens and put reals together in the
>parser. -John]


Hmm. 1.01 => INT<1> DOT INT<1>. Not quite right.
I worked on a compiler once that did this kind of thing, and we had to be
extra careful. The final solution was to reject this approach as too buggy.
Note that there are several languages where the interpretation of a
character depends on the previous token type.


        AWK:
If the previous token was one that could end an operand,
/ is the division operator (a/b)
Otherwise,
/ begins a pattern literal /..../


        Ada:
If the previous token was one that could end an operand,
' is the attribute tick (string'length)
Otherwise,
' begins a character literal ('x').


It sounds as though this language may be another of the same sort:


If the previous token was one that could end an operand,
. is the indexing dot
Otherwise,
. is the decimal pointer character.


If the previous token was an indexing dot,
a digit string can only begin an integer literal
Otherwise,
a digit string may begin an integer or a floating point literal.


So you need three states:
        - previous token was one that could end an operand
(numeric literal, identifier, right parenthesis, ...)
        - previous token was an indexing dot
        - all other cases.


--
John Æneas Byron O'Keefe; 1921/02/04-1997/09/27; TLG,TLTA,BBTNOTL.
Richard A. O'Keefe; RMIT Comp.Sci; http://www.cs.rmit.edu.au/%7Eok
--


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.