gsamaras - 1 year ago 85

C Question

From a .c file of another guy, I saw this:

`const float c = 0.70710678118654752440084436210485f;`

where he wants to avoid the computation of

`sqrt(1/2)`

Can this be really stored somehow with plain

`C/C++`

I am using C++, but I do not believe that precision difference between this two languages are too big (if any), that' why I did not test it.

So, I wrote these few lines, to have a look at the behaviour of the code:

`std::cout << "Number: 0.70710678118654752440084436210485\n";`

const float f = 0.70710678118654752440084436210485f;

std::cout << "float: " << std::setprecision(32) << f << std::endl;

const double d = 0.70710678118654752440084436210485; // no f extension

std::cout << "double: " << std::setprecision(32) << d << std::endl;

const double df = 0.70710678118654752440084436210485f;

std::cout << "doublef: " << std::setprecision(32) << df << std::endl;

const long double ld = 0.70710678118654752440084436210485;

std::cout << "l double: " << std::setprecision(32) << ld << std::endl;

const long double ldl = 0.70710678118654752440084436210485l; // l suffix!

std::cout << "l doublel: " << std::setprecision(32) << ldl << std::endl;

The output is this:

`* ** ***`

v v v

Number: 0.70710678118654752440084436210485 // 32 decimal digits

float: 0.707106769084930419921875 // 24 >> >>

double: 0.70710678118654757273731092936941

doublef: 0.707106769084930419921875 // same as float

l double: 0.70710678118654757273731092936941 // same as double

l doublel: 0.70710678118654752438189403651592 // suffix l

where

`*`

`float`

`**`

`double`

`***`

`long double`

The output of

`double`

`std::cout`

`float`

`float has 24 binary bits of precision, and double has 53.`

I would expect the last output to be the same with the pre-last, i.e. that the

`f`

`double`

`const double df = 0.70710678118654752440084436210485f;`

what happens is that first the number becomes a

`float`

`double`

`double`

Am I correct?

From this answer I found some relevant information:

`float x = 0 has an implicit typecast from int to float.`

float x = 0.0f does not have such a typecast.

float x = 0.0 has an implicit typecast from double to float.

[EDIT]

About

`__float128`

Recommended for you: Get network issues from **WhatsUp Gold**. **Not end users.**

Answer Source

From the standard:

There are three ﬂoating point types: float, double, and long double. The type double provides at least as much precision as float, and the type long double provides at least as much precision as double. The set of values of the type float is a subset of the set of values of the type double; the set of values of the type double is a subset of the set of values of the type long double. The value representation of ﬂoating-point types is implementation-deﬁned.

So you can see your issue with this question: the standard doesn't actually say how precise floats are.

In terms of standard implementations, you need to look at IEEE754, which means the other two answers from Irineau and Davidmh are perfectly valid approaches to the problem.

As to suffix letters to indicate type, again looking at the standard:

The type of a ﬂoating literal is double unless explicitly speciﬁed by a suﬃx. The suﬃxes f and F specify float, the suﬃxes l and L specify long double.

So your attempt to create a `long double`

will just have the same precision as the `double`

literal you are assigning to it unless you use the `L`

suffix.

I understand that some of these answers may not seem satisfactory, but there is a lot of background reading to be done on the relevant standards before you can dismiss answers. This answer is already longer than intended so I won't try and explain everything here.

And as a final note: Since the precision is not clearly defined, why not have a constant that's longer than it needs to be? Seems to make sense to always define a constant that is precise enough to always be representable regardless of type.

Recommended from our users: **Dynamic Network Monitoring from WhatsUp Gold from IPSwitch**. ** Free Download**