Re: how to avoid a memset() optimization

"Charles Bryant" <>
13 Nov 2002 12:17:24 -0500

          From comp.compilers

Related articles
how to avoid a memset() optimization (Francis Wai) (2002-11-07)
Re: how to avoid a memset() optimization (Lars Duening) (2002-11-08)
Re: how to avoid a memset() optimization (Alex Colvin) (2002-11-08)
Re: how to avoid a memset() optimization (Fergus Henderson) (2002-11-12)
Re: how to avoid a memset() optimization (Christian Bau) (2002-11-12)
Re: how to avoid a memset() optimization (Lars Duening) (2002-11-12)
Re: how to avoid a memset() optimization (Clayton Weaver) (2002-11-12)
Re: how to avoid a memset() optimization (Charles Bryant) (2002-11-13)
Re: how to avoid a memset() optimization (Dobes Vandermeer) (2002-11-13)
Re: how to avoid a memset() optimization (Fergus Henderson) (2002-11-13)
Re: how to avoid a memset() optimization (Jan C. =?iso-8859-1?Q?Vorbr=FCggen?=) (2002-11-13)
Re: how to avoid a memset() optimization (Arthur Chance) (2002-11-13)
Re: how to avoid a memset() optimization (Chris F Clark) (2002-11-15)
Re: how to avoid a memset() optimization (Arthur Chance) (2002-11-15)
[4 later articles]
| List of all articles for this month |

From: "Charles Bryant" <>
Newsgroups: comp.compilers
Date: 13 Nov 2002 12:17:24 -0500
Organization: Compilers Central
References: 02-11-030
Keywords: optimize, comment
Posted-Date: 13 Nov 2002 12:17:24 EST

Francis Wai <> wrote:
>In a recent article (,
>Peter Gutmann raised a concern which has serious implications in
>secure programming. His example, along the lines of,
>int main()
> char key[16];
> strcpy(key, "whatever");
> encrpts(key);
> memset(key, 0, 16);
>where memset() was optimized away because memset() is the last
>expression before the next sequence point and that its side-effect is
>not needed and that the subject of memset() is an auto variable. The
>compiler sees that it is legitimate to optimize it away.

Using a standard function such as memset() may permit such
optimisation, but if you write a special function for the purpose,
for example, clrmem(char *buf, unsigned size), then the compiler
cannot optimise it away. How could the compiler know that clrmem()
doesn't compute a value based on its input and store the result in a
global or static variable for later collection? The only way is if
there is global optimisation which is so aggressive that it deduces
the behaviour of functions in separate files. In fact, it must be the
linker which does this, since at compilation time you might not have
written clrmem(). And even if such a linker conspires with a compiler
to eliminate dead code, you can always write clrmem() in assembly.

Ultimately, in all but the most bizarre systems, there must be a way
to accomplish what you want to do, since exactly the same situation
occurs when you write data into a buffer and some piece of hardware
uses DMA to read the data. Note that such systems couldn't verify
that you really were doing DMA since a device might use DMA to fetch
a descriptor block and then interpret it as instructions for further
DMA operations - ultimately the first block could be executable code
for another CPU, so defeating the optimisation is only impossible in
some sort of closed system where the compiler and linker between them
have total knowledge of all hardware devices that will ever be
designed for the system.

Unfortunately, having appearntly proven that you can achieve what you
want, I must now prove the opposite.

Firstly, let me address the issue of 'volatile'. Its appearance in
this context is related to its appearance in relation to
multithreaded programming. Beginners to multithreaded programming
often believe that the use of 'volatile' is necessary and sufficient
to protect simultaneous access to a variable. While it may be on some
systems, in general (and in particular in conjunction with the
popular POSIX threading standard) it is neither necessary nor
sufficient. To see why, consider a system with two CPUs: A and B.
Suppose the function encrpts() is computationally intensive and split
between the two CPUs. The hardware might be connected like this:

CPU A + cache <---> Memory <---> cache + CPU B

Obviously this may result in some of key[] appearing in each cache.
The problem with 'volatile' is that when one CPU (e.g. CPU A)
executes code using 'volatile' semantics, at most it will result in
its cache and main memory being modified. There's no mechanism to
force the cache belonging to CPU B to be modified as well.

Of course this particular problem is very minor, since the program no
longer regards key[] as containing valid data, so CPU B's cache will
get discarded eventually, and only if there's a bug in the code will
it let part of key[] leak out.

However, there's a far more serious problem: the memory page
containing key[] might have been written to a paging file.
While that part of the paging file will eventually be re-used, it may
take an arbitrarily long time, and in the meanwhile the system may be
switched off, leaving the key as a mere bit pattern on the disk
vulnerable to any diagnostic tool that reads what used to be the
paging area.

Ultimately, this is a hardware problem. The requirement is that
hardware be put into a specific state, so the solution must be
hardware specific. The normal way to address this is by providing an
abstract interface which encapsulates the required semantics, leaving
the impementation to vary as necessary. In this case, for example:

int prepare_secure(secdesc_t *area, void *buf, unsigned size);

Prepares the specified memory area for secure access,
saving any necessary state in *area. This may involve
locking the pages it occupies, or merely modifying the
pager so that if it gets paged out, as soon as it is
paged in again that disk page is immediately scheduled
to be overwritten.

int delete_secure(secdesc_t area);

Ensures the area previously prepared with
prepare_secure() is completely erased.

This would make the code:

int main()
        char key[16];
        secdesc_t s;
        if (prepare_secure(&s, key, sizeof(key))) {
         printf("Cannot prepare secure area\n");
        strcpy(key, "whatever");
        memset(key, 0, 16);
        if (delete_secure(&s)) {
         printf("Cannot delete secure area\n");

Obviously this cannot be optimised away any more than the call to
printf() can be optimised away.
[Volatile isn't intended to address cache coherency and other multi-processor
issues. You need something beyond C to handle that. But see a subsequent
message where someone actually looked at the relevant parts of the C
standard. -John]

Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.