carsten - 1 year ago 93
C++ Question

# ULP comparison code

The following code snippet is scattered all over the web and seems to be used in multiple different projects with very little changes:

``````union Float_t {
Float_t(float num = 0.0f) : f(num) {}
// Portable extraction of components.
bool Negative()    const { return (i >> 31) != 0; }
int  RawMantissa() const { return i & ((1 << 23) - 1); }
int  RawExponent() const { return (i >> 23) & 0xFF; }

int i;
float f;
};

inline bool AlmostEqualUlpsAndAbs(float A, float B, float maxDiff, int maxUlpsDiff)
{
// Check if the numbers are really close -- needed
// when comparing numbers near zero.
float absDiff = std::fabs(A - B);
if (absDiff <= maxDiff)
return true;

Float_t uA(A);
Float_t uB(B);

// Different signs means they do not match.
if (uA.Negative() != uB.Negative())
return false;

// Find the difference in ULPs.
return (std::abs(uA.i - uB.i) <= maxUlpsDiff);
}
``````

See, for example here or here or here.

However, I don't understand what is going on here. To my (maybe naive) understanding, the floating-point member variable
`f`
is initialized in the constructor, but the integer member
`i`
is not.

I'm not terribly familiar with the binary operators that are used here, but I fail to understand how accesses of
`uA.i`
and
`uB.i`
produce anything but random numbers, given that no line in the code actually connects the values of
`f`
and
`i`
in any meaningful way.

If somebody could enlighten my on why (and how) exactly this code produces the desired result, I would be very delighted!

A lot of Undefined Behaviour are being exploited here. First assumption is that fields of union can be accessed in place of each other, which is, in itself, UB. Furthermore, coder assumes that: `sizeof(int) == sizeof(float)`, that floats have a given length of mantissa and exponent, that all union members are aligned to zero, that the binary representation of float coincides with the binary representation with int in a very specific way. In short, this will work as long as you're on x86, have specific int and float types and you say a prayer at every sunrise and sunset.
What you probably didn't note is that this is a union, therefore `int i` and `float f` is usually aligned in a specific manner in a common memory array by most compilers. This is, in general, still UB and you can't even safely assume that the same physical bits of memory will be used without restricting yourself to a specific compiler and a specific architecture. All that's guaranteed is, the address of both members will be the same (but there might be alignment and/or typedness issues). Assuming that your compiler uses the same physical bits (which is by no means guaranteed by standard) and they both start at offset 0 and have the same size, then `i` will represent the binary storage format of `f`.. as long as nothing changes in your architecture. Word of advice? Do not use it until you don't have to. Stick to floating point operations for `AlmostEquals()`, you can implement it like that. It's the very final pass of optimization when we consider these specialities and we usually do it in a separate branch, you shouldn't plan your code around it.