Potatoswatter Potatoswatter - 12 days ago 5
C Question

Round floating-point value to e.g. single precision

C and C++ provide floating-point data types of several widths, but they leave precision unspecified. The compiler is free to use idealized arithmetic to simplify expressions, to use double precision in computing an expression over

float
values, or to use a double-precision register to keep the value of a
float
variable or common subexpression.

Correct me if I'm wrong, but it's even legal to hoist a
float
in memory into a double-precision register, so storing a value and then loading it back doesn't necessarily truncate bits.

What is the safest, most portable way to convert a number to a lower precision? Ideally, it should be efficient too, compiling to
cvtsd2ss
on SSE2. (So, while
volatile
may be an answer, I'd prefer something better.)

Edit: Summarizing some of the comments and findingsā€¦


  • Wider precision for intermediate results is always fair game.

  • Expression simplification is allowed in C++, and in C given
    FP_CONTRACT on
    .

  • Using double precision for a single-precision
    float
    is not allowed (in C or C++).



However, some compilers (particularly GCC on x86-32) illegally forget some precision conversions.

Answer

The C99 5.2.4.2.2p8 excplicitly says that

assignment and cast [..] remove all extra range and precision

So, if you want to limit the range and precision to that of a float, just cast to float, or assign to a float variable.

You can even do stuff like (double)((float)d) (with extra parentheses to make sure humans read it correctly), limiting a variable d to float precision and range, then casting it back to double. (A standard C compiler is NOT allowed to optimize that away even if d is a double; it must limit the precision and range to that of a float.)

I've used this in practical implementations of e.g. Kahan summation algorithm, where it can be utilized to allow the C compiler to do very aggressive optimization, but without risk of invalidation.