Expression stack, code generation...

"Orlando Llanes" <ollanes@prodigy.net>
9 Mar 2002 02:52:34 -0500

From comp.compilers

Related articles
*Expression stack, code generation... ollanes@prodigy.net (Orlando Llanes)* (2002-03-09)**

| List of all articles for this month |

From:	"Orlando Llanes" <ollanes@prodigy.net>
Newsgroups:	comp.compilers
Date:	9 Mar 2002 02:52:34 -0500
Organization:	Prodigy Internet http://www.prodigy.com
Keywords:	question
Posted-Date:	09 Mar 2002 02:52:33 EST

        First of all, I want to thank Jack Crenshaw for the awesome "Let's Build
A Compiler" tutorials!!! :) I have two books on the subject, and neither of
them gave me as great a start as the tutes did. I bought "Writing Compilers
and Interpreters: An Applied Approach" (original C edition) roughly eight
years ago, and I got "The Dragon Book" roughly two years ago, yet I couldn't
begin to understand them until I read the tutorials.

        BTW, when I say stack I might actually mean a queue. I call it a stack
simply because it saves stuff for later use.

        I modified the parse.pas program from one of Jack Crenshaw's
tutorials and came up with a stack based Recursive Descent expression
parser. When the parser gets to Factor or Ident (which is called from
within Factor), it pushes the value onto the stack. At the end of Add,
Subtract, Multiply, and Divide the operation is pushed onto the stack
if the previous two values are not integer constants. Otherwise, if
the previous two values are integer constants they are combined. As
you will see, order of operations are still properly maintained.

The following expression...

a = 123 + 456 * 789 + a + 111 + b * 34 + 14

....becomes...

359907
A
+
111
+
B
34
*
+
14
+

(if you're interested in the code, a "prototype of a prototype" in Pascal,
point your browsers to http://pages.prodigy.net/ollanes/parse.pas and e-mail
me if you have any questions about it)

        Obviously the A in the beginning, and the = at the end were left out,
that's because an expression isn't always the result of an assignment. In
other words expressions are used for assignments, function arguments,
Boolean expressions, loops, etc.

        Further expression reductions can be done (ie: 12 * a * 45 = 540 *
a), but that's too "advanced" for me right now. So the next step is code
generation.

        I want to generate code that looks something like the following...

load reg1, 359907
add reg1, [A]
add reg1, 111
load reg2, [B]
mul reg2, 34
add reg1, reg2
add reg1, 14

        BTW, don't worry about register contention, this is for a virtual
machine so it won't be a bottleneck (unless someone develops a JIT compiler
that is :P).

        The biggest problem I see here is generating the correct code while
"walking" the stack.

I'm thinking of the following...

        while (Stack[Index].type != c_Op) && (Index < Top)
                Index++

        That's the easy part... The hard part is getting the factors while
getting the operators. A total of about 3 positions on the stack have to be
maintained while at the same time checking for the top of the stack: 1) the
"current" factor, 2) the operator, 3) the start of the current precedence
"level".

For example...

Iteration 1:
123 <--- 1) "current" factor & 3) start
A
456
* <--- 2) the operator
+
B
+

Iteration 2:
123 <--- 3) start
A <--- 1) "current" factor
456
* <--- 2) the operator
+
B
+

Iteration 3:
123 <--- 3) start
A
456 <--- 1) "current" factor
* <--- 2) the operator
+
B
+
        Emit "load reg1, 123"
        Emit "load reg2, [A]"
        Emit "mul reg2, 456"

Iteration 4:
123 <--- 3) start
A
456 <--- 1) "current" factor
*
+ <--- 2) the operator
B
+
        Emit "add reg1, reg2"

Iteration 5:
123
A
456
*
+
B <--- 1) "current" factor & 3) start
+ <--- 2) the operator
        Emit "load reg2, [B]"
        Emit "add reg1, reg2"

        The other hard part is having a maximum of four 32-bit Integer
registers, and a maximum of four 64-bit Floating point registers. When all
registers are used, they must either be pushed into the VM stack at
run-time, or must be pushed onto a temporary stack at compile time.
Otherwise, the code generator would have to go to the "innermost" part of
each precedence level of the expression and work its way backwards.

        Is there a "well documented" algorithm that uses this technique? If not,
am I on the right track?

        BTW, I'm thinking Unary operators can be handled from the Factor
function. This requires a one *token* lookahead and would look as follows...

if IsOp(Token) && (NextToken = c_tkIntConst)
        if Token == c_tkMinus
                GetToken
                TokenVal = -TokenVal
        elseif Token == c_tkPlus
                GetToken
        elseif Token == c_tkNot
                GetToken
                TokenVal = !TokenVal
        endif
        PushIntConst( TokenVal )

elseif IsOp(Token) && (NextToken = c_tkIdent)
        if Token == c_tkMinus
                GetToken
                PushVar(TokenName)
                PushOp(c_tkNeg)
        elseif Token == c_tkPlus
                GetToken
                PushVar(TokenName)
        elseif Token == c_tkNot
                GetToken
                PushVar(TokenName)
                PushOp(c_tkNot)
        endif
endif

See ya!
Orlando

Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.

Expression stack, code generation...

"Orlando Llanes" <ollanes@prodigy.net>9 Mar 2002 02:52:34 -0500

"Orlando Llanes" <ollanes@prodigy.net>
9 Mar 2002 02:52:34 -0500