Re: Bounds checking, Optimization techniques and undefined behavior

David Brown <david.brown@hesbynett.no>
Tue, 7 May 2019 14:01:18 +0200

From comp.compilers

Related articles
[19 earlier articles]
Re: Bounds checking, Optimization techniques and undefined behavior christopher.f.clark@compiler-resources.com (Christopher F Clark) (2019-05-06)
Re: Bounds checking, Optimization techniques and undefined behavior bc@freeuk.com (Bart) (2019-05-06)
Re: Bounds checking, Optimization techniques and undefined behavior 0xe2.0x9a.0x9b@gmail.com (Jan Ziak) (2019-05-06)
Re: Bounds checking, Optimization techniques and undefined behavior anw@cuboid.co.uk (Andy Walker) (2019-05-06)
Re: Bounds checking, Optimization techniques and undefined behavior david.brown@hesbynett.no (David Brown) (2019-05-06)
Re: Bounds checking, Optimization techniques and undefined behavior david.brown@hesbynett.no (David Brown) (2019-05-07)
*Re: Bounds checking, Optimization techniques and undefined behavior david.brown@hesbynett.no (David Brown)* (2019-05-07)**
Re: Bounds checking, Optimization techniques and undefined behavior david.brown@hesbynett.no (David Brown) (2019-05-07)
Re: Bounds checking, Optimization techniques and undefined behavior david.brown@hesbynett.no (David Brown) (2019-05-07)
Re: Bounds checking, Optimization techniques and undefined behavior nuno.lopes@ist.utl.pt (Nuno Lopes) (2019-05-07)
Re: Bounds checking, Optimization techniques and undefined behavior bc@freeuk.com (Bart) (2019-05-08)
Re: Bounds checking, Optimization techniques and undefined behavior anw@cuboid.co.uk (Andy Walker) (2019-05-08)
Re: Bounds checking, Optimization techniques and undefined behavior david.brown@hesbynett.no (David Brown) (2019-05-08)
[9 later articles]

| List of all articles for this month |

From:	David Brown <david.brown@hesbynett.no>
Newsgroups:	comp.compilers
Date:	Tue, 7 May 2019 14:01:18 +0200
Organization:	A noiseless patient Spider
References:	19-04-021 19-04-023 19-04-037 19-04-039 19-04-042 19-04-044 19-04-047 19-05-004 19-05-006 19-05-016 19-05-020 19-05-024 19-05-025 19-05-028 19-05-031 19-05-036
Injection-Info:	gal.iecc.com; posting-host="news.iecc.com:2001:470:1f07:1126:0:676f:7373:6970"; logging-data="98368"; mail-complaints-to="abuse@iecc.com"
Keywords:	C, optimize, errors
Posted-Date:	07 May 2019 17:30:35 EDT
Content-Language:	en-GB

On 06/05/2019 14:07, Bart wrote:
> On 05/05/2019 22:38, George Neuner wrote:
>> On Sun, 5 May 2019 11:14:51 +0100, Bart <bc@freeuk.com> wrote:
>
>>> You intend p to refer to the 4-element slice A[3..6], but how does the
>>> language know that? How can it stop code from writing to p[5]?
>>
>> You declare 'p' as int (*p)[4] and then the compiler could check the
>> use. Theoretically at least, I'm not sure it actually is done in many
>> situations.
>
> I declare pointers to arrays as T(*)[] when generating C code. But
> you're right in that no one else does that when writing C.
>

I'd declare a pointer to an array in that way too, as would anyone else
programming C - because that is what the type means. But I would not
use it for a pointer /into/ an array, since that is a different thing.
And George had been talking about slices of an array, which is different
again.

If you have an array A of 10 ints ("int A[10];"), and you take a pointer
to, say, element 3 ("int * p = &A[3];"), then that pointer is /not/
compatible with a type "int (*)[4]" or any other pointer-to-array type.
  So using such types for "slices" into a C array is not valid - it would
be undefined behaviour. (To be fair, I think it is likely that most
compilers will implement it in the way you would expect - but C
certainly does not say it should work.)

Remember that even though arrays used as lvalues in expressions decay to
pointers, arrays are independent types in C. An array of 10 ints
contains ints, it does not contain sub-arrays of different sizes any
more than an "int" contains two "short int" items.

> Note that this is an open bound; usually the bound will be dynamic, and
> held in a separate variable, which the language does not know is the bound.
>

True. C has no standard way to associate an array size with a pointer.
  In a new language, such an association is likely to be a good idea.

> C has something called VLAs, which is really a type where any bounds are
> defined as a runtime expression.

Yes, except that they are types, not "a type" - VLAs are types that are
not fixed until run-time, and the type of a given VLA may vary each time
you call the function or enter the block in which it is defined.

> If you had a loop which extracted
> different slices on each iteration, you would obliged to declare 'p'
> within the loop, so it has a slightly different type (with different
> bounds) each time around.
>

As noted above, using a pointer to an array as a slice of a bigger array
is not valid. If you want "slice" types in C, you have to make them
with a struct (that is the only way of defining new types in C, along
with unions) and make your own access functions or macros. It would be
possible, and can be done in a type-safe way, but it would undeniably be
ugly and awkward.

> But this is very restrictive (for example I don't like using local block
> scopes).

This is /you/ being restrictive. You can't blame a language for your
own personal prejudices about features you don't want to use!

> It is also a rather heavyweight feature just to allow the
> possibility of bounds checking.
>
> (Also something I haven't implemented in my own C compiler; I just don't
> know how to approach it. And I don't like the feature.)
>
> Proper slicing (since we are not restricted to C or other existing
> languages) is simpler and better.
>

Agreed. C does not have such a feature as part of the language. (In
C++, you can make it - and it will be standardised in the up-coming
"ranges" library. But in C it would be a mess.)

>>>    struct {int a,b,c,d;} S;
>>>
>>>    p = &S.a;
>>>
>>> You intend p to be used to access a,b,c,d as an int[4] array, but p's
>>> bounds will say it's only one element long.
>>
>> The larger problem is that C even permits that.
>
> I was half-expecting someone to say it was undefined behaviour. I
> suppose you will say the way to declare that pointer is as:
>
>    int (*p)[4] = (int(*)[4])&S.a;

That would be undefined behaviour. (If I were being pedantic, I'd say
that /using/ the pointer p here would be undefined behaviour. But I am
not going to do that.)

>
> The problem is that if you want to make C a safer, checked language,
> none of this stops people writing it the wrong way.

Correct. C is not a strongly typed language. It is rather cumbersome
to use it in a strongly type-checked manor (you need to keep wrapping
things in structs, which then become more awkward to use), and there are
always ways to get round any restrictions.

>
>> If you want the
>> struct elements also to be available as an array, you should have used
>> a union.
>
> Maybe the struct is defined elsewhere and is not your code to change. Or
> maybe the struct is {int a,b,c[20];}, and you want to treat a, b, c[0],
> c[1] as an array.
>
> The fact is that this is a low level language. You need to be able to do
> stuff like this.
>

Actually it is extremely rare that you need to do stuff like that.
Define your types the way you need to use them, and use them. Very
occasionally, you need to do something like that with externally defined
types - a few lines of accessor code is simple enough.

>> C has a lot of warts, no question ... but its biggest problem is that
>> the routine (ab)use of pointers in, so-called, "idiomatic" C in a real
>> sense is working against the compiler - making it's job much harder.
>
> So hard that I wouldn't even attempt it. Creating a more restrictive,
> safer (or easier to check) language would be easier (IMO).

Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.

Re: Bounds checking, Optimization techniques and undefined behavior

David Brown <david.brown@hesbynett.no>Tue, 7 May 2019 14:01:18 +0200

David Brown <david.brown@hesbynett.no>
Tue, 7 May 2019 14:01:18 +0200