Re: Implementing types in a compiler

John Lilley <jlilley@empathy.com>
30 Jan 1997 22:21:36 -0500

          From comp.compilers

Related articles
Implementing types in a compiler pjmlp@students.si.fct.unl.pt (1997-01-29)
Re: Implementing types in a compiler pertsserver!mkgardne@uunet.uu.net (1997-01-30)
Re: Implementing types in a compiler jlilley@empathy.com (John Lilley) (1997-01-30)
| List of all articles for this month |

From: John Lilley <jlilley@empathy.com>
Newsgroups: comp.compilers
Date: 30 Jan 1997 22:21:36 -0500
Organization: Nerds for Hire, Inc.
References: 97-01-238
Keywords: OOP, types

Paulo Jose Pinto - LEI wrote:


> more sense to me to create an abstract class Type and then use
> ProcedureType, ConstType and other for all kinds of information. After
> all when we have an union in C, that should be implement in C++ as an
> abstract class and all the cases as descendents of that class. But
> this requires some sort of dynamic_cast so that I know which member
> function I can call, after all doesn't make sense to use a member
> function GetParameter (....) in ConstType.


I'm posting as much to get other's responses as to offer an opinion.
In a C++ parser I am working on, I represent types like you suggest --
using a base class an then creating derived CPPType classes from it.
For various reasons my base class is not abstract (for example, the
base class contains the list of type-modifiers). From the CPPType
base class, I derive some others:


      CPPTypeBuiltin -- int, long, char, wchar_t, etc
      CPPTypeClass -- class types
      CPPTypeEnum -- enums
      CPPTypeModifier -- an intermediate base for type-modification


The CPPTypeModifier conceptually modifies another CPPType. There are
specific modifiers derived from it:


      CPPTypeIndirection -- reference/pointer
      CPPTypeArrayOf
      CPPTypePointerToMember
      CPPTypeFunctionReturning
      CPPTypeFunctionArgument
      ... and so on ...


This approach is static and address-dependent (e.g., very early
binding). I assume and guarantee that there is no type aliasing. In
other words, if a CPPTypeIndirection describes a pointer to another
type, then it is the *only* CPPTypeIndirection in existence that
modifies that particular type in that particular way. This leads to
compact type representations, trie-fashion.


There are other approaches to this as well. On the other extreme is a
symbolic (very late binding) approach, in which all types are desribed
as LISPish trees of modifiers where the information is mostly symbolic
or name-oriented. This is very flexible, and minimizes the amount of
processing that happens in the parser (which is, in general, a good
thing).


I have found that for C++, the parser needs immediate access to all
the type and declaration information (to parse template
specializations), so I choose the early-binding approach to give me
easily-walked data structures. I wish someone would show me that the
C++ parser doesn't really need all the type information to parse
syntax, but no such advice has been forthcoming :(


I've pondered other, hybrid approaches, such as constructing a
symbolic AST first, and then when symbol table information is needed,
process the AST incrementally and add its information to the symbol
table. It can be argued that the symbol table is merely a cache of
the AST and is there solely for performance reasons. But it's a
fairly dramatic speedup; walking the AST is O(N) and the symbol table
is O(logN) or O(Constant).


I'd like to hear other opinions.


john lilley
--


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.