Re: Number of compiler passes

George Neuner <gneuner2/@/comcast.net>
Fri, 25 Jul 2008 19:29:07 -0400

          From comp.compilers

Related articles
Number of compiler passes m.helvensteijn@gmail.com (Michiel) (2008-07-21)
Re: Number of compiler passes gah@ugcs.caltech.edu (glen herrmannsfeldt) (2008-07-21)
Re: Number of compiler passes gneuner2@comcast.net (George Neuner) (2008-07-21)
Re: Number of compiler passes m.helvensteijn@gmail.com (Michiel) (2008-07-22)
Re: Number of compiler passes dwashington@gmx.net (Denis Washington) (2008-07-25)
Re: Number of compiler passes m.helvensteijn@gmail.com (Michiel) (2008-07-25)
Re: Number of compiler passes gneuner2/@/comcast.net (George Neuner) (2008-07-25)
Re: Number of compiler passes m.helvensteijn@gmail.com (Michiel) (2008-07-26)
Re: Number of compiler passes gah@ugcs.caltech.edu (glen herrmannsfeldt) (2008-07-27)
Re: Number of compiler passes gneuner2@comcast.net (George Neuner) (2008-07-28)
Re: Number of compiler passes gneuner2@comcast.net (George Neuner) (2008-07-28)
Re: Number of compiler passes gah@ugcs.caltech.edu (glen herrmannsfeldt) (2008-07-29)
Re: Number of compiler passes gah@ugcs.caltech.edu (glen herrmannsfeldt) (2008-07-29)
[6 later articles]
| List of all articles for this month |

From: George Neuner <gneuner2/@/comcast.net>
Newsgroups: comp.compilers
Date: Fri, 25 Jul 2008 19:29:07 -0400
Organization: Compilers Central
References: 08-07-041 08-07-044 08-07-048
Keywords: analysis
Posted-Date: 26 Jul 2008 05:32:39 EDT

On Tue, 22 Jul 2008 21:31:57 +0200, Michiel <m.helvensteijn@gmail.com>
wrote:


>George Neuner wrote:
>
>>>* Flex + Bison to create AST
>>>* 1 Over AST to find declarations (and re-declarations)
>>>* 2 Over AST to detect which variables are used in which functions
>>>* 3 Over AST to find undeclared references
>>>* 4 Over AST to find the type of each expression and note type mismatches
>>>* 5 Over AST to find the access-type (read/write/both) of each expression
>>>* 6 Over AST to perform optimizations
>>>* 7 Over AST to translate to target language
>>
>> You don't give much detail of your language, but I don't see how
>> "readability or provability" would be harmed by combining passes 1 & 2
>> ...
>> Assuming variables must be declared, processing declarations gives you
>> all you need - the symbol table(s) you construct will provide scoping
>> for non-local variables and functions. I don't see a need for two
>> passes.
>
>In the source language, a function can use variables from outside its
>scope. More importantly, these variables could be declared after the
>function is. This is okay as long as this function is not called
>before the declaration of these variables.


As long as non-local variables are transitively in the lexical scope
chain of the function that uses them you can still gather all the
declarations in one pass. If they're not, you have a language that
will be error prone and confusing to use.


The simplest way to do it is to organize your symbol "table" as a
stack of lists[*]. You attach a local symbol list to each function's
definition node in the AST. As you recurse into a function, you push
its local symbol list onto the symbol stack and pop it off again when
you leave the function. The list of globals you keep at the bottom of
the stack. To look up a symbol you just examine the lists starting
from the top of the stack - the first correctly named symbol you find
will be the one nearest in scope to the usage point. This allows for
lexical shadowing of same-named variables in enclosing scopes.


[*] or stack of whatever data structure floats your boat.




>> I don't know what you mean by "access-type" of an expression - unless
>> you're talking about I/O in which case I still don't know what you
>> mean. We could be having a terminology disconnect - to my mind an
>> "expression" always computes or side effects, so it always reads
>> operands and sometimes writes a result.
>
>Yes, I suppose I am stretching the definition a bit. An expression has a
>type (bool, int, etc.). But to me it also has an access type. This is
>either readonly, writeonly or readwrite. It basically specifies whether it
>is an r-value, an l-value or both.


I believe you are overthinking the problem.


L-value or R-value is a matter of usage, not type. Obviously it
matters whether a particular variable is in an operand position or in
a result position, but that depends on the class of the expression -
unary, binary, ternary, etc. - and not on the operand or result types
or the operators involved.


IMO, value handling should be isolated in the handler function for the
particular expression type and is not anything I would ever care about
at the symbol information level.


If you really want to, you can annotate the variable's AST node as to
whether it's used for value or address, but I don't think that's
necessary. Since you are targeting C which already has a fairly
sophisticated understanding of complex data types, your compiler
really doesn't have to bother. If your source language says "A[X] :=
A[X] + 1", you can just say that directly in C after appropriately
defining A and X. You don't need to transform it all the way down
into address computations and dereferences.


George
--
for email reply remove "/" from address



Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.