Python Fanboy Python Fanboy - 9 months ago 50
Python Question

Using float literals instead of double in Cython code?

When compiling the Cython code below with MSVC:

cpdef float interpolate(float start, float end, float alpha):
return end * alpha + start * (1.0 - alpha)


I get this warning:

warning C4244: '=': conversion from 'double' to 'float', possible loss of data


It's related to the
1.0
in the code, which should be float, but it's double. Can the above Cython code be modified to prevent the warning from appearing?

Edit: Just found out that I can cast this literal to float like that:
<float>1.0
. Does this affect runtime performance anyhow?

Answer Source

The revised question is whether there is any run-time performance penalty to writing

cpdef float interpolate_cast(float start, float end, float alpha):
    return end * alpha + start * (<float>1.0 - alpha)

instead of

cpdef float interpolate_lit(float start, float end, float alpha):
    return end * alpha + start * (1.0f - alpha)

(if you could write that, which you can't).

In general, the answer to questions of this type is "of course not, the compiler will generate exactly the same machine code either way (make sure you did turn the optimizer on)"; but for floating point it's not always a sure thing, because there are non-obvious restrictions on how floating point can be optimized.

After stripping out an enormous volume of CPython integration glue, this is the code that Cython generates for the first function above:

float interpolate_cast(float start, float end, float alpha) {
  float r;

  r = ((end * alpha) + (start * (((float)1.0) - alpha)));
  goto L0;
  L0:;
  return r;
}

I manually created a second copy of this function with (float)1.0 changed to 1.0f, and compiled both with GCC 6.3 (on x86-64, using -O2 -march=nativenot using -ffast-math. This is the assembly code I got (again, a bunch of irrelevant chatter has been removed):

interpolate_cast:
        vmovss  .LC0(%rip), %xmm3
        vsubss  %xmm2, %xmm3, %xmm3
        vmulss  %xmm0, %xmm3, %xmm0
        vfmadd231ss     %xmm2, %xmm1, %xmm0
        ret

interpolate_lit:
        vmovss  .LC0(%rip), %xmm3
        vsubss  %xmm2, %xmm3, %xmm3
        vmulss  %xmm0, %xmm3, %xmm0
        vfmadd231ss     %xmm2, %xmm1, %xmm0
        ret

.LC0:
        .long   1065353216

So you can see that it comes out exactly the same either way. (The mysterious number 1065353216 is 0x3f800000 is 1.0). You can repeat this experiment with MSVC to find out if that compiler does the same thing (I would expect it to).

If this function is truly performance-critical you should be thinking about getting it vectorized. For instance, you could write this C computational kernel:

#include <stddef.h>
void interpolate_many(float *restrict dest,
                      float const *restrict start,
                      float const *restrict end,
                      float const *restrict alpha,
                      size_t n)
{
  for (size_t i = 0; i < n; i++)
    dest[i] = end[i] * alpha[i] + start[i] * (1.0f - alpha[i]);
}

and put a Cython wrapper around it that takes appropriately-typed NumPy arrays. GCC can autovectorize this; MSVC should be able to as well, and Intel's compiler certainly can. (I wouldn't try to write the kernel in Cython, because you probably won't be able to get it annotated sufficiently to activate the autovectorizer; those consts and restricts are essential.)