JonyK - 3 months ago 46

Python Question

when performing math operations on float16 Numpy numbers, the result is also in float16 type number.

My question is how exactly the result is computed?

Say Im multiplying/adding two float16 numbers, does python generate the result in float32 and then truncate/round the result to float16? Or does the calculation performed in '16bit multiplexer/adder hardware' all the way?

another question - is there a float8 type? I couldnt find this one... if not, then why? Thank-you all!

Answer

To the first question: there's no hardware support for `float16`

on a typical processor (at least outside the GPU). NumPy does exactly what you suggest: convert the `float16`

operands to `float32`

, perform the scalar operation on the `float32`

values, then round the `float32`

result back to `float16`

. It can be proved that the results are still correctly-rounded: the precision of `float32`

is large enough (relative to that of `float16`

) that double rounding isn't an issue here, at least for the four basic arithmetic operations and square root.

In the current NumPy source, this is what the definition of the four basic arithmetic operations looks like for `float16`

scalar operations.

```
#define half_ctype_add(a, b, outp) *(outp) = \
npy_float_to_half(npy_half_to_float(a) + npy_half_to_float(b))
#define half_ctype_subtract(a, b, outp) *(outp) = \
npy_float_to_half(npy_half_to_float(a) - npy_half_to_float(b))
#define half_ctype_multiply(a, b, outp) *(outp) = \
npy_float_to_half(npy_half_to_float(a) * npy_half_to_float(b))
#define half_ctype_divide(a, b, outp) *(outp) = \
npy_float_to_half(npy_half_to_float(a) / npy_half_to_float(b))
```

The code above is taken from scalarmath.c.src in the NumPy source. You can also take a look at loops.c.src for the corresponding code for array ufuncs. The supporting `npy_half_to_float`

and `npy_float_to_half`

functions are defined in halffloat.c, along with various other support functions for the `float16`

type.

For the second question: no, there's no `float8`

type in NumPy. `float16`

is a standardized type (described in the IEEE 754 standard), that's already in wide use in some contexts (notably GPUs). There's no IEEE 754 `float8`

type, and there doesn't appear to be an obvious candidate for a "standard" `float8`

type. I'd also guess that there just hasn't been that much demand for `float8`

support in NumPy.