Luchian Grigore Luchian Grigore - 2 months ago 16
C++ Question

Accessing inactive union member and undefined behavior?

I was under the impression that accessing an

union
member other than the last one set is UB, but I can't seem to find a solid reference (other than answers claiming it's UB but without any support from the standard).

So, is it undefined behavior?

Answer

The confusion is that C explicitly permits type-punning through a union, whereas C++ () has no such permission.

6.5.2.3 Structure and union members

95) If the member used to read the contents of a union object is not the same as the member last used to store a value in the object, the appropriate part of the object representation of the value is reinterpreted as an object representation in the new type as described in 6.2.6 (a process sometimes called ‘‘type punning’’). This might be a trap representation.

The situation with C++:

9.5 Unions [class.union]

In a union, at most one of the non-static data members can be active at any time, that is, the value of at most one of the non-static data members can be stored in a union at any time.

C++ later has language permitting the use of unions containing structs with common initial sequences; this doesn't however permit type-punning.

To determine whether union type-punning is allowed in C++, we have to search further. Recall that is a normative reference for C++11 (and C99 has similar language to C11 permitting union type-punning):

3.9 Types [basic.types]

4 - The object representation of an object of type T is the sequence of N unsigned char objects taken up by the object of type T, where N equals sizeof(T). The value representation of an object is the set of bits that hold the value of type T. For trivially copyable types, the value representation is a set of bits in the object representation that determines a value, which is one discrete element of an implementation-defined set of values. 42
42) The intent is that the memory model of C++ is compatible with that of ISO/IEC 9899 Programming Language C.

It gets particularly interesting when we read

3.8 Object lifetime [basic.life]

The lifetime of an object of type T begins when: — storage with the proper alignment and size for type T is obtained, and — if the object has non-trivial initialization, its initialization is complete.

So for a primitive type (which ipso facto has trivial initialization) contained in a union, the lifetime of the object encompasses at least the lifetime of the union itself. This allows us to invoke

3.9.2 Compound types [basic.compound]

If an object of type T is located at an address A, a pointer of type cv T* whose value is the address A is said to point to that object, regardless of how the value was obtained.

Assuming that the operation we are interested in is type-punning i.e. taking the value of a non-active union member, and given per the above that we have a valid reference to the object referred to by that member, that operation is lvalue-to-rvalue conversion:

4.1 Lvalue-to-rvalue conversion [conv.lval]

A glvalue of a non-function, non-array type T can be converted to a prvalue. If T is an incomplete type, a program that necessitates this conversion is ill-formed. If the object to which the glvalue refers is not an object of type T and is not an object of a type derived from T, or if the object is uninitialized, a program that necessitates this conversion has undefined behavior.

The question then is whether an object that is a non-active union member is initialized by storage to the active union member. As far as I can tell, this is not the case and so although if:

  • a union is copied into char array storage and back (3.9:2), or
  • a union is bytewise copied to another union of the same type (3.9:3), or
  • a union is accessed across language boundaries by a program element conforming to ISO/IEC 9899 (so far as that is defined) (3.9:4 note 42), then

the access to a union by a non-active member is defined and is defined to follow the object and value representation, access without one of the above interpositions is undefined behaviour. This has implications for the optimisations allowed to be performed on such a program, as the implementation may of course assume that undefined behaviour does not occur.

That is, although we can legitimately form an lvalue to a non-active union member (which is why assigning to a non-active member without construction is ok) it is considered to be uninitialized.

Comments