space_voyager space_voyager - 27 days ago 8
C++ Question

Is reading a member that wasn't the most recently written in GCC undefined behavior?

The C++ reference has the following explanation for unions, with the interesting part for this question in bold:


The union is only as big as necessary to hold its largest data member. The other data members are allocated in the same bytes as part of that largest member. The details of that allocation are implementation-defined, and it's undefined behavior to read from the member of the union that wasn't most recently written. Many compilers implement, as a non-standard language extension, the ability to read inactive members of a union.


Now, if I compile on Linux Mint 18 with
g++ -std=c++11
the following code, I get the following output (given by comments next to the
printf
statements):

#include <cstdio>
using namespace std;

union myUnion {
int var1; // 32 bits
long int var2; // 64 bits
char var3; // 8 bits
}; // union size is 64 bits (size of largest member)

int main()
{
myUnion a;
a.var1 = 10;
printf("a is %ld bits and has value %d\n",sizeof(a)*8,a.var1); // ...has value 10
a.var2 = 123456789;
printf("a is %ld bits and has value %ld\n",sizeof(a)*8,a.var2); // ...has value 123456789
a.var3 = 'y';
printf("a is %ld bits and has value %c\n",sizeof(a)*8,a.var3); // ...has value y
printf("a is %ld bits and has value %ld\n",sizeof(a)*8,a.var2); //... has value 123456789, why???
return 0;
}


On the line before
return 0
, reading
a.var2
gives not the ASCII decimal of the
'y'
character (which is what I expected, I'm new to unions) but the value with which it was first defined. Based on the above quote from cppreference.com, am I to understand that this is undefined behaviour in the sense that it is not standard, but rather GCC's particular implementation?

EDIT

As pointed out by the great answers below, I made a copying mistake in the comment after the
printf
statement just before
return 0
. The correct version is:

printf("a is %ld bits and has value %ld\n",sizeof(a)*8,a.var2); //... has value 123456889, why???


i.e. the 7 changes to an 8, because the first 8 bits are overwritten with the ASCII value of the
'y'
character, i.e.
121
(
0111 1001
in binary). I'll leave it as it is in the above code to stay coherent with the great discussion that resulted from it, though.

Answer

The fun thing about undefined behavior is that it's very specifically not the same as "random" behavior. Compilers will have a behavior that they decide to use when dealing with undefined behavior, and tend to exhibit the same behavior every time.

Case in point: IDEOne has its own interpretation of this code: http://ideone.com/HO5id6

a is 32 bits and has value 10
a is 32 bits and has value 123456789
a is 32 bits and has value y
a is 32 bits and has value 123456889

You might notice something kind of funny happened there (setting aside the fact that for IDEOne's compiler, long int is 32 bits and not 64 bits). It still shows line 4 as reading similarly to line 2, but the value has actually changed slightly. What appears to have happened is that the char value of 'y' was set in the union, but it didn't alter any of the other bits. I got similar behavior when I switched it to long long int instead of long int.

You may want to check if, in your example, line 4 is exactly the same as it was before. I'm a little skeptical that that's actually the case.

At any rate, to answer your specific question, the TL;DR is that in GCC, writing to a union only alters the bits associated with the specific member you're writing to, and it's not guaranteed to alter/clear all the other bits. And of course, like anything UB-related, make no assumptions that any other compiler (or even later versions of the same compiler!) will behave the same.