user2485710 user2485710 - 1 month ago 9
C++ Question

What is the difference between a proper defined union and a reinterpret_cast?

Can you propose at least 1 scenario where there is a substantial difference between

union {
T var_1;
U var_2;
}


and

var_2 = reinterpret_cast<U> (var_1)


?

The more i think about this, the more they look like the same thing to me, at least from a practical viewpoint.

One difference that I found is that while the union size is big as the biggest data type in terms of size, the reinterpret_cast as described in this post can lead to a truncation, so the plain old C-style union is even safer than a newer C++ casting.

Can you outline the differences between this 2 ?

Answer

Contrary to what the other answers state, from a practical point of view there is a huge difference, although there might not be such a difference in the standard.

From the standard point of view, reinterpret_cast is only guaranteed to work for roundtrip conversions and only if the alignment requirements of the intermediate pointer type are not stronger than those of the source type. You are not allowed (*) to read through one pointer and read from another pointer type.

At the same time, the standard requires similar behavior from unions, it is undefined behavior to read out of a union member other than the active one (the member that was last written to)(+).

Yet compilers often provide additional guarantees for the union case, and all compilers I know of (VS, g++, clang++, xlC_r, intel, Solaris CC) guarantee that you can read out of an union through an inactive member and that it will produce a value with exactly the same bits set as those that were written through the active member.

This is particularly important with high optimizations when reading from network:

double ntohdouble(const char *buffer) {          // [1]
   union {
      int64_t   i;
      double    f;
   } data;
   memcpy(&data.i, buffer, sizeof(int64_t));
   data.i = ntohll(data.i);
   return data.f;
}
double ntohdouble(const char *buffer) {          // [2]
   int64_t data;
   double  dbl;
   memcpy(&data, buffer, sizeof(int64_t));
   data = ntohll(data);
   dbl = *reinterpret_cast<double*>(&data);
   return dbl;
}

The implementation in [1] is sanctioned by all compilers I know (gcc, clang, VS, sun, ibm, hp), while the implementation in [2] is not and will fail horribly in some of them when aggressive optimizations are used. In particular, I have seen gcc reorder the instructions and read into the dbl variable before evaluating ntohl, thus producing the wrong results.


(*) With the exception that you are always allowed to read from a [signed|unsigned] char* regardless of that the real object (original pointer type) was.

(+) Again with some exceptions, if the active member shares a common prefix with another member, you can read through the compatible member that prefix.