plasmacel plasmacel - 2 months ago 7
C++ Question

Interchangeability of IEEE 754 floating-point addition and multiplication

Is the addition

x + x
interchangeable by the multiplication
2 * x
in IEEE 754 (IEC 559) floating-point standard, or more generally speaking is there any guarantee that
case_add
and
case_mul
always give exactly the same result?

#include <limits>

template <typename T>
T case_add(T x, size_t n)
{
static_assert(std::numeric_limits<T>::is_iec559, "invalid type");

T result(x);

for (size_t i = 1; i < n; ++i)
{
result += x;
}

return result;
}

template <typename T>
T case_mul(T x, size_t n)
{
static_assert(std::numeric_limits<T>::is_iec559, "invalid type");

return x * static_cast<T>(n);
}

Answer

Is the addition x + x interchangeable by the multiplication 2 * x in IEEE 754 (IEC 559) floating-point standard

Yes, since they are both mathematically identical, they will give the same result (since the result is exact in floating point).

or more generally speaking is there any guarantee that case_add and case_mul always give exactly the same result?

Not generally, no. From what I can tell, it seems to hold for n <= 5:

  • n=3: as x+x is exact (i.e. involves no rounding), so (x+x)+x only involves one rounding at the final step.
  • n=4 (and you're using the default rounding mode) then

    • if the last bit of x is 0, then x+x+x is exact, and so the results are equal by the same argument as n=3.
    • if the last 2 bits are 01, then the exact value of x+x+x will have last 2 bits of 1|1 (where | indicates the final bit in the format), which will be rounded up to 0|0. The next addition will give an exact result |01, so the result will be rounded down, cancelling out the previous error.
    • if the last 2 bits are 11, then the exact value of x+x+x will have last 2 bits of 0|1, which will be rounded down to 0|0. The next addition will give an exact result |11, so the result will be rounded up, again cancelling out the previous error.
  • n=5: since x+x+x+x is exact, it holds for the same reason as n=3.

For n=6 it fails, e.g. take x to be 1.0000000000000002 (the next double after 1.0), in which case 6x is 6.000000000000002 and x+x+x+x+x+x is 6.000000000000001