cesss cesss - 2 months ago 12
C++ Question

Correctly converting floating point in C++

What would be the correct/recommended way of telling the C++ compiler "only warn me of floating point conversions that I'm not aware of"?

In C, I would enable the warnings related to floating point conversions, and then I would use explicit C-style casts to silence warnings related to the conversions that are under control.

For example, computing

a*a*a - b*b
is quite prone to overflow in single precision floating point, so you might wish to compute it in double precision and only go single precision later:

double a = 443620.52;
double b = 874003.01;
float c = (float)(a*a*a - b*b);

The above C-style explicit cast would silence the compiler warning about the conversion from

Reading C++ documentation about casts, I get to the conclusion that the correct way of doing this in C++ would be as follows:

double a = 443620.52;
double b = 874003.01;
float c = static_cast<float>(a*a*a - b*b);

But, is this really the correct way of doing this in C++?

I understand the rationale behind the
syntax being ugly on purpose, so that you avoid casts completely if possible.

Yes, I can omit the explicit cast to float. But then I need to disable compiler warnings telling me of precision loss (or otherwise I'd get a number of irrelevant warnings that would make it difficult to notice really relevant warnings). And if I disable fp-related compiler warnings, I'd lose the possibility of being warned when I'm mistakenly losing precision in other code places.

So, what's the correct approach for floating point conversions in C++?



float c = static_cast<float>(a*a*a - b*b);

is the correct way of explicitly casting to float in C++. You can also do:

float c = (float)(a*a*a - b*b);

but using a "C-style" cast like that is bad style because static_cast will hide rather fewer errors than C-style.

Alternatively, if you are doing this a lot, you can define a function:

inline float flt(double d){return static_cast<float>(d);}

and then you can write:

float c = flt(a*a*a - b*b);

which is even more compact than the original C (and will be optimized away to nothing).