John Am - 7 months ago 23

C Question

I'm trying to eliminate all floating point computations in an embedded application and I need to scale/multiply a signed long 32 bit integer with

`0.0000000004656f`

The context is

`( pulse[i] * ( triosc[i] * 0.0000000004656f ) )`

Both

`pulse[i]`

`triosc[i]`

So I need my

`triosc[i]`

`0.0f`

`1.0f`

`saw_x2[i] = (long)( pulse[i] * (triosc[i] * 0.0000000004656f) );`

sine_osc[i] = (long)( ((triangle2[i] * (saw_x2[i] * 0.0000000004656f))) *

(pulse[i] * 0.0000000004656f) ) << 2;

return (sine_osc[i]);

Answer

The fixed point values in `pulse[i]`

and `triosc[i]`

are signed quantities expressed in units of 2^{-31}. The mathematical values are **pulse[i] / 2 ^{31}** and

`pulse[i] * (triosc[i] * 0.0000000004656f))`

, but note that the floating point value is not precise enough, it would be more precise to write `pulse[i] * (triosc[i] / 2147483648.F)`

, but the result would still lose precision due to the `float`

representation with only 23 bits of matissa.Performing the multiplication in integer arithmetic with a 64 bit intermediary step is actually more precise.

It can be done this way:

```
((uint64_t)pulse[i] * triosc[i]) >> 31
```

or equivalently:

```
((long long)pulse[i] * triosc[i]) >> 31
```

**EDIT**

You really should use types from `<stdint.h>`

to avoid making assumptions about the size of `long`

. It is 32 bits on your current system, but it may be 64 on the next hardware. Here is how you can rewrite the expressions:

```
int32_t saw_x2[SIZE];
int32_t pulse[SIZE];
int32_t triosc[SIZE];
int32_t triangle2[SIZE];
int32_t sine_osc[SIZE];
...
saw_x2[i] = (int32_t)(((int64_t)pulse[i] * triosc[i]) >> 31);
int64_t temp = ((int64_t)triangle2[i] * saw_x2[i]) >> 31;
sine_osc[i] = (int32_t)(((temp * pulse[i]) >> 31) << 2);
return sine_osc[i];
```

Note however that if any of these values become negative, right shifting is not guaranteed to produce the correct result. Dividing by `2147483648`

would be the required method but may produce less efficient code:

```
saw_x2[i] = (int32_t)((int64_t)pulse[i] * triosc[i] / 2147483648);
int64_t temp = (int64_t)triangle2[i] * saw_x2[i] / 2147483648;
sine_osc[i] = (int32_t)((temp * pulse[i] / 2147483648) << 2);
return sine_osc[i];
```

Also, since you multiply by 4 in the last step, you would get 2 more bits of precision by dividing by 2^{29} instead:

```
sine_osc[i] = (int32_t)(temp * pulse[i] / 536870912);
```