Gwendal Roué - 5 months ago 30

C Question

I have one double, and one int64_t. I want to know if they hold exactly the same value, and if converting one type into the other does not lose any information.

My current implementation is the following:

`int int64EqualsDouble(int64_t i, double d) {`

return (d >= INT64_MIN)

&& (d < INT64_MAX)

&& (round(d) == d)

&& (i == (int64_t)d);

}

Some sample inputs:

- int64EqualsDouble(0, 0.0) should return 1
- int64EqualsDouble(1, 1.0) should return 1
- int64EqualsDouble(0x3FFFFFFFFFFFFFFF, (double)0x3FFFFFFFFFFFFFFF) should return 0, because 2^62 - 1 can be exactly represented with int64_t, but not with double.
- int64EqualsDouble(0x4000000000000000, (double)0x4000000000000000) should return 1, because 2^62 can be exactly represented in both int64_t and double.
- int64EqualsDouble(INT64_MAX, (double)INT64_MAX) should return 0, because INT64_MAX can not be exactly represented as a double
- int64EqualsDouble(..., 1.0e100) should return 0, because 1.0e100 can not be exactly represented as an int64_t.

Answer

Yes, your solution works correctly because it was designed to do so, because `int64_t`

is represented in two's complement by definition (C99 7.18.1.1:1), on platforms that use something resembling binary IEEE 754 double-precision for the `double`

type. It is basically the same as this one.

Under these conditions:

`d < INT64_MAX`

is correct because it is equivalent to`d < (double) INT64_MAX`

and in the conversion to double, the number`INT64_MAX`

, equal to 0x7fffffffffffffff, rounds up. Thus you want`d`

to be strictly less than the resulting`double`

to avoid triggering UB when executing`(int64_t)d`

.On the other hand,

`INT64_MIN`

, being -0x8000000000000000, is exactly representable, meaning that a`double`

that is equal to`(double)INT64_MIN`

can be equal to some`int64_t`

and should not be excluded (and such a`double`

can be converted to`int64_t`

without triggering undefined behavior)

It goes without saying that since we have specifically used the assumptions about 2's complement for integers and binary floating-point, the correctness of the code is not guaranteed by this reasoning on platforms that differ. Take a platform with binary 64-bit floating-point and a 64-bit 1's complement integer type `T`

. On that platform `T_MIN`

is `-0x7fffffffffffffff`

. The conversion to `double`

of that number rounds down, resulting in `-0x1.0p63`

. On that platform, using your program as it is written, using `-0x1.0p63`

for `d`

makes the first three conditions true, resulting in undefined behavior in `(T)d`

, because overflow in the conversion from integer to floating-point is undefined behavior.

If you have access to full IEEE 754 features, there is a shorter solution:

```
#include <fenv.h>
…
#pragma STDC FENV_ACCESS ON
feclearexcept(FE_INEXACT), f == i && !fetestexcept(FE_INEXACT)
```

This solution takes advantage of the conversion from integer to floating-point setting the INEXACT flag iff the conversion is inexact (that is, if `i`

is not representable exactly as a `double`

).

The INEXACT flag remains unset and `f`

is equal to `(double)i`

if and only if `f`

and `i`

represent the same mathematical value in their respective types.

This approach requires the compiler to have been warned that the code accesses the FPU's state, normally with `#pragma STDC FENV_ACCESS on`

but that's typically not supported and you have to use a compilation flag instead.