Re: Writing a C Compiler: lvalues

Stargazer <stargazer3p14@gmail.com>
Mon, 10 May 2010 00:44:28 -0700 (PDT)

          From comp.compilers

Related articles
Writing a C Compiler: lvalues andre.nho@gmail.com (=?ISO-8859-1?Q?Andr=E9_Wagner?=) (2010-05-08)
Re: Writing a C Compiler: lvalues ben.usenet@bsb.me.uk (Ben Bacarisse) (2010-05-09)
Re: Writing a C Compiler: lvalues bartc@freeuk.com (bart.c) (2010-05-09)
Re: Writing a C Compiler: lvalues tom@iahu.ca (Tom St Denis) (2010-05-09)
Re: Writing a C Compiler: lvalues kst-u@mib.org (Keith Thompson) (2010-05-09)
Re: Writing a C Compiler: lvalues esosman@ieee.org (Eric Sosman) (2010-05-09)
Re: Writing a C Compiler: lvalues stargazer3p14@gmail.com (Stargazer) (2010-05-10)
Re: Writing a C Compiler: lvalues marc@lithia.nl (Marc van Lieshout) (2010-05-16)
Re: Writing a C Compiler: lvalues esosman@ieee.org (Eric Sosman) (2010-05-17)
Re: Writing a C Compiler: lvalues kst-u@mib.org (Keith Thompson) (2010-05-17)
Re: Writing a C Compiler: lvalues kst-u@mib.org (Keith Thompson) (2010-05-19)
Re: Writing a C Compiler: lvalues bartc@freeuk.com (bart.c) (2010-05-19)
Re: Writing a C Compiler: lvalues lawrence.jones@siemens.com (2010-05-19)
[4 later articles]
| List of all articles for this month |

From: Stargazer <stargazer3p14@gmail.com>
Newsgroups: comp.lang.c,comp.compilers
Date: Mon, 10 May 2010 00:44:28 -0700 (PDT)
Organization: Compilers Central
References: 10-05-036
Keywords: C, code
Posted-Date: 11 May 2010 03:15:57 EDT

On May 8, 4:34 pm, Andri Wagner <andre....@gmail.com> wrote:
> Hello,
>
> I'm writing a C compiler. It's almost over, except that is not
> handling lvalues correctly.


It's not "almost over" then :-)


> Let me show a example. The code "x = 5" (let's say 'x' was declared
> before) yields this in pseudo-assembly:
>
> mov $b, $fp+8 ; $fp+8 is 'x' addess, so I'm storing x's address in
> $b
> mov $a, 5
> mov [$b], $a ; here I'm putting what's in $a in the address
> pointed to $b
>
> Since 'x' is a lvalue in this case, I don't need its value, just the
> address of the variable.
>
> Now, if I want to access 'x' in the middle of a non-lvalue expressing,
> I would do:
>
> mov $a, $fp+8
> mov $a, [$a]


It looks as real x86 assembly and looks like you're jumping into
assembly generation too early.


> Notice how I get the varible addres, and from it, the value.
>
> What I'm trying to say is: the compiler yields different assembly code
> for when 'x' is a lvalue and when 'x' is not a lvalue.
>
> This gets more confusing when I have expressions such as 'x++'. This
> is simple, since 'x' is obviously a lvalue in this case. In the case
> of the compiler, I can parse 'x' and see that the lookahead points to
> '++', so it's a lvalue.


No, you can't assume that programmer always writes correct code. A
programmer may mistake, as in Eric's example, or he can write junk as


if (heaven)
    666--;


and compiler must be able to determine that an assignment to a non-
lvalue takes place.


> But what about '(x)++'? In this case, the compiler evaluates the
> subexpression '(x)', and this expression results the value of 'x', not
> the address. Now I have a '++' ahead, so how can I know the address of
> 'x' since all that I have is a value?


When I attempted at writing a C compiler (I wrote parser by hand), I
defined a "simpler C" pseudo-code - a subset of C, which allowed only
assignments in form "__temp_NN = &var;", "__temp_NN = *__temp_MM;",
"*__temp_NN = __temp_MM;", "__temp_NN = ~__temp_MM;" (instead of "~"
there could be "!" or "-") and "__temp_NN = var1 + var2;" (instead of
"+" there could be any arithmetic or logic binary operator). Also
allowed were conditional branches in form of "if (__temp_NN != 0) goto
xxx;" and unconditional branches ("goto xxx;"). "__temp_NN" were
temporary variables of suitable type for machine registers and if out
of registers they were added as additional local variables.


Then "x" and "address of x" would be evaluated separately, something
like "__temp_1 = x;", then at next sequence point: "__temp_2 = &x;
*__temp_2 = __temp_1". If "x" is not an l-value, during generation of
"__temp_2 = &x" compiler will fail parsing and show diagnostic.


Pseudo-code is a good thing, it allows easy debugging of the parser
and also - easy processing by optimizer. Pseudo-code should be defined
in a way that it answers C standard's requirements (think that if for
programmers the standard is a guide, for compiler's writer it's an
SRS) and that it includes only operations supported by any sensible
CPU architectures.


Note that while you don't need to care about anything that is
"undefined behavior" (the generated code needs not be meaningful), you
must add special rules processing for the standard's constraints.



Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.