Related articles |
---|
1 - 1, 1 -1, 1-1, 1 - -1 and -2147483648 devriese@cs.tcd.ie (Edsko de Vries) (2006-01-31) |
Re: 1 - 1, 1 -1, 1-1, 1 - -1 and -2147483648 torbenm@app-0.diku.dk (2006-01-31) |
Re: 1 - 1, 1 -1, 1-1, 1 - -1 and -2147483648 rsc@swtch.com (Russ Cox) (2006-01-31) |
Re: 1 - 1, 1 -1, 1-1, 1 - -1 and -2147483648 jm@bourguet.org (Jean-Marc Bourguet) (2006-01-31) |
Re: 1 - 1, 1 -1, 1-1, 1 - -1 and -2147483648 mailbox@dmitry-kazakov.de (Dmitry A. Kazakov) (2006-01-31) |
Re: 1 - 1, 1 -1, 1-1, 1 - -1 and -2147483648 gneuner2@comcast.net (George Neuner) (2006-01-31) |
Re: 1 - 1, 1 -1, 1-1, 1 - -1 and -2147483648 rivers@dignus.com (Thomas David Rivers) (2006-01-31) |
[5 later articles] |
From: | "Edsko de Vries" <devriese@cs.tcd.ie> |
Newsgroups: | comp.compilers |
Date: | 31 Jan 2006 09:57:37 -0500 |
Organization: | http://groups.google.com |
Keywords: | lex, arithmetic, question |
Posted-Date: | 31 Jan 2006 09:57:37 EST |
Hi,
I have a question regarding lexical analysis. I recently came across a
bug in our lexical analyser in phc (www.phpcompiler.org), that I am
unsure how to solve. This is the problem: our current definition for
integer constant looks something like
INT ([1-9][0-9]*)|0
In particular, note that it does not allow for an (optional) "+" or "-"
at the start of the integer. This means that the strings "1 - 1", "1
-1" and "1-1" all generate the same sequence of three tokens INT(1),
OP(-), INT(1), for which the syntax analyser generates the subtree
BIN_OP(-, 1, 1).
For the string "1 - -1", the lexer (unsurprisingly) generates INT(1),
OP(-), OP(-), INT(1). The syntax analyser recognises this as BIN_OP(1,
UNARY_OP(-, 1)). In other words, the second "-" is treated as a unary
operator, rather than as part of the number.
This works fine, with the sole exception of the number "-2147483648".
The problem is, of course, overflow: -2147483648 is a valid negative
number (assuming 32-bit numbers), but the integer 2147483648 is _not_ a
valid positive number. Thus, the above method of dealing with "-" as a
unary operator breaks down.
The solution is to interpret the "-" as part of the number, and
generate INT(-2147483648), rather than OP(-), INT(...). However,
changing the definition of INT to
INT [+-]?([1-9][0-9]*)|0
causes "1-1" to be recognised as INT(1), INT(-1), which is incorrect.
Even requiring a space before the operator
INT (" "[+-])?([1-9][0-9]*)|0
does not help, because "1 -1" will be recognised as INT(1), INT(-1)
instead of INT(1), OP(-), INT(1).
Is there a good solution that solves both problems? At the moment, I
have added a new rule to the lexer that matches this exact number,
which solves the problem - but it's a hack.. (actually, there are two
rules, one for 32-bit machines, one for 64-bit machines..)
Thanks,
Edsko
Return to the
comp.compilers page.
Search the
comp.compilers archives again.