Re: Appropriate methods for argument checking

chase@centerline.com (David Chase)
26 Jul 1996 23:14:28 -0400

          From comp.compilers

Related articles
Appropriate methods for argument checking clodius@nis.lanl.gov (William Clodius) (1996-07-22)
Re: Appropriate methods for argument checking grout@polestar.csrd.uiuc.edu (1996-07-23)
Re: Appropriate methods for argument checking chase@centerline.com (1996-07-26)
Re: Appropriate methods for argument checking jgllgher@maths.tcd.ie (Dara Gallagher) (1996-07-31)
| List of all articles for this month |

From: chase@centerline.com (David Chase)
Newsgroups: comp.compilers
Date: 26 Jul 1996 23:14:28 -0400
Organization: CenterLine Software
References: 96-07-144
Keywords: design, debug

William Clodius <clodius@nis.lanl.gov> () writes:
> There has been a discussion this week in comp.lang.fortran about the
> appropriate method for checking whether the a function is being called with
> the appropriate argument types. Some of the older programmers are not happy
> with F90's modules and claim that at least this capability "should" be
> implemented by having type consistency checks in the loader.


> I have been skeptical of this because I have heard that Bertand Meyer has
> had great difficulty implementing compile time checking for Eiffel to deal
> with its covariance requirements, and because properly defining and
> conveying the appropriate type information to a loader that has to deal with
> multiple source languages looked more complicated and less robust to me than
> to the old timers.


(skip ahead to "irrelevant" if you don't care about co/contravariance, and
prefer to see claims about what is possible to check)


Covariance is the problem here. Without "global consistency checks", it is
unsound. One way around this is (I think I have this correct) parametric
polymorphism, but that's not what Eiffel uses, and it yields a different
set of "legal substitutions". (Beware of tricky jargon in any discussion
of this -- always try to figure out what substitutions are legal). By
"legal substitution", I simple the relationship between the variable/parameter
type and the value types that may be legal assigned/bound to that variable
or parameter. The difference between contravariance and covariance via
parametric polymorphism (again, please correct me if I have the term wrong)
is this:


Contravariance (using a somewhat concrete example)
    IF char IS ALWAYS A short (i.e., if [-256,255] is a subset of [-32768,32767])
    THEN the following procedure-typed variable assignments are legal in this
              C-like syntax that is NOT LEGAL C.


            char (*f)(char) = f_of_char_yielding_char;
            char (*f)(char) = f_of_short_yielding_char; // can take short parameters, will
                                                                                                    // only see char.


            short (*f)(char) = f_of_char_yielding_short;
            short (*f)(char) = f_of_char_yielding_char; // a char result is also a short
            short (*f)(char) = f_of_short_yielding_char;
            short (*f)(char) = f_of_short_yielding_short;


            short (*f)(short) = f_of_short_yielding_char;
            short (*f)(short) = f_of_short_yielding_short;


For the two types given here, these are ALL the legal assignments.


If you use parametric polymorphism, the variable type is actually polymorphic:


            T (*f)(T) <type T> = f_of_char_yielding_char;
            T (*f)(T) <type T> = f_of_short_yielding_short;


At this point, I have pretty much hit the limit of what I can confidently say
about this sort of polymorphism, but note that I am decidely unsure about the
legality of any other assignments that might be made (actually, I think you
can still make the contravariant assignments, but I would not bet a lot
of money on this without further study). When David Shang talks about
types (which he inevitably will), I think this is the sort of type system
that he has in mind. It is very important to note that the signature


    T (*f)(T)<type T>


does not let you subtype T willy-nilly when binding -- the contravariant
assignment is legal if it is regarded in two steps:


    f_of_short_yielding_char maybe substituted for either a


          short (*f)(short)


    or a


          char (*f)(char)


    so if you do the binding to one or the other (or if it is done implicitly
    for you) then you may in turn assign that to T (*f)(T)<type T>. Note that
    this simple example has already introduced the problem of ambiguity -- in
    this case, it seems not to matter, but ambiguity is in general unsettling
    (two doesn't look bad, unless I can double the number of cases again, and
    again, and again, by simple additions to the type system).


    It is not, not, not valid to reason like this:


        "I'm assigning a function to a variable with T (*f)(T)<type T>, let's
          say 'T' is a short, and char and short are both subtypes of short,
          I'll treat the two occurrences of 'T' in the polymorphic signature
          independently, so I'll bind f_of_char_yielding_short to it."


Note (keep in mind the comment about not-legal-C) that this depends upon
a common representation for all the different types, or else something
(expensive) under the covers to deal with varying representations. It
is plausible to implement C in a way such that all parameter and result
passing "widens to integer" (note that unsigned int and signed int are
typically incompatible types when regarded as sets of values) but that
this is not guaranteed by the standard. In the case of "object types",
as realized in most "OO" languages, "object values" are represented as
pointers to storage, and hence all object values have a common interchange
representation. It is possible to do it all differently -- for instance,
I might choose to represent the subrange type [1000,1255] in 8 bits, so a
translation step would be necessary. This becomes especially hairy when
you introduce by-reference parameters.


I am very, very nervous about making claims as to what is possible,
because Type Experts tell me that it is easy to wander off into the
land of the halting problem when considering type systems. This can
be postponed by not doing type inference, but some of the resulting
types are so complicated that this may not be a reasonable option for
programmers who are merely human.


However, since Fortran lacks covariance, this is slightly irrelevant.


For consistency checking, my employer (this is a slightly shameless plug)
recently announced a product that includes this for C++. It checks
the "One Definition Rule" (externally visible types with the same
name ought to have the same definition), and it checks for consistent
declarations of return types and parameters for functions in both C
and C++. It also checks a variant of the one-definition rule for Ansi C.
The checking is done by a separate program invoked at link time -- this
is not encoded into the type names. Similar checking is performed as
a matter of course for other languages, such as (my favorite example)
Modula-3, and, of course, Java.


So, I think it is possible, and not too hard, either. My suspicion
is that types in Fortran-90 are no more complicated than those in
Modula-3, and probably a good deal less complicated than they are in C++.
Note that the use of a separate program to check this (as opposed to
using mangled names to check this in the linker) simplifies interlanguage
issues substantially -- if the information about an entrypoint is lacking
(because it was written in C, for instance), then the checker can merely
mention this as a warning, and continue.


speaking for myself,


David Chase
--


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.