Re: Semantic (Type) analysis phase question (RKRayhawk)
1 Apr 2000 14:02:11 -0500

          From comp.compilers

Related articles
Semantic (Type) analysis phase question (Nicolás) (2000-03-23)
Re: Semantic (Type) analysis phase question (Tom Payne) (2000-03-23)
Re: Semantic (Type) analysis phase question (Pablo Moisset) (2000-03-25)
Re: Semantic (Type) analysis phase question (2000-04-01)
Re: Semantic (Type) analysis phase question (Tom Moog) (2000-04-03)
| List of all articles for this month |

From: (RKRayhawk)
Newsgroups: comp.compilers
Date: 1 Apr 2000 14:02:11 -0500
Organization: AOL
References: 00-03-138
Keywords: analysis

I would add another consideration to the excellent and concise
responses you have so far. Perhaps you might consider the view that a
compiler (in some situations) is less a code generator and more a user
help facility that enables the coder to find errors and go to the next

So if there is anything you can do that makes the error messages
clearer with one approach then another (in terms of delaying or not
until major tree segments are built), then choose the path that allows
the best interaction with the user of your tool.

An extreme case might be that you can detect errors even _before_ your
build your tree. Lets say that the symbol <- means assign. Let us say
that your language only allows assignment of numerics to numerics and
does not allow assignment of character data to numerics. If a user
assigns a numeric to a numeric, as in

target.numeric <- source.numeric

then things compile well. And that can be parsed with simple possitive
logic rules in the grammar. If a user assigns a character data item to
a numeric, as in

target.numeric <- source.character

You can try to trap that either with type checking in the generalized
rule such as

assign_stmt : anyitem ASSIGN anyitem

but you would need logic to test the types; either as this general
rule is reduced; or as reasonably suggested by others, later when the
whole tree is complete (or perhaps by implication, simply later at
some boundary, such as loop, or function).

Yet you could also have competing rules: one a positive logic rule, such as

assign_stmt_okay :
  numericitem ASSIGN numericitem
  { /* build tree or gen code */

and one a negative logic rule, such as

assign_stmt_type_error :
  numericitem ASSIGN characteritem
  { /*diagnose this */

This approach can lead to a proliferation of rules, but if you don't
do that then you get a proliferation of tests for type compatibility
later (either within the 'production' for the general case, or the
subsequent tree walker).

The point of drawing this out as an alternative is that it happens
neither as you build the tree nor after, but rather as you parse. And
in some portions of your grammar, this may allow you to set up
diagnostics that describe specifically anticipated coding problems.

If your interest is generalized, and not just for your current effort,
I would propose the idea that the stronger the data typing of the
language, the earlier (in compilation phases) you can detect type
incompatibilities amonst operators.
  This because the type information is on the surface in a strongly typed
language and even the language of the grammar tool can be provided distinct
particles to reference the distinct types.

So the earlier you want to check type, the earlier you must manifest
the distinctions. In practicle circumstances that frequently makes the
grammar too complex for most folks taste, and type checking is
relegated to some mechanical scans aft in the compilation sequence.
So it is partly a question of planning well enough ahead of time to
get all of the attributes of type available to the earliest phase
where you want to get serious about it. In effect the symbol table is
the place where things like type usually are in waiting, and usually
such attributes are not manifest in the grammar.

But planning is the key. If the majority of your type validation
occurs late, it is still okay to bring some of it up to the front if
that solves problems for you. The highest priority, IMHO, is to
communicate clearly with the user of your tool about errors, whether
type compatibility problems or not. So where ever you can gather
enough information to clearly diagnose problems, then at that point or
later is good enough to check type.

So, for completeness, let me suggest one other variation of the same
theme. Do not trap type problems in some mechanism that can not easily
readout to the user what the problem is (because of the context of the

For example, if we had items and operator, in many patterns in a language, such
    item2 OP item2
do not simply type check this in an extremely universal routine if it
is difficult to tell the user what the over-all situation is. For
example, if item2 was supposed to be a numeric, for some reason, do
not get yourself in a situation where you do the age old trick
    "expecting numeric, found item2"
which could happen if you isolate subrules in to very generalized
microscopic pieces. Let us say that this pattern could occur in two
ways. Let say the language allowed

  ACTION-A item2 OP item2
  ACTION-B item2 OP item2

If the pattern
  item2 OP item2
is at a lower level in the grammar structure, you may need to do extra
work to pull the preceding tokens or other preceding reduced rule
information out of the parse tree you have built.

In such situations it may be better to state, diagnostically,
something to the effect that
  "item2 can not be used in an ACTION-B" statement.

Surely you see that this example is a bit contrived. But the point is
that as you consider puting the type checking early or late, you
ineffitably are drawn to the convenience of really generalized
checking, and you will want to map similar cases together. But try to
keep your user in mind as you decide these things, as you are really
designing the diagnostics facility.

Best Wishes,

Robert Rayhawk

Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.