plasmacel plasmacel - 3 months ago 14
C++ Question

Harsh differences in generated assembly of floating-point comparisons < and >=

I'm experimenting with the generated assembly and found an interesting thing.
There are two function doing an identical computation. The only difference between them is the way how the results are summed together.

#include <cmath>

double func1(double x, double y)
{
double result1;
double result2;

if (x*x < 0.0) result1 = 0.0;
else
{
result1 = x*x+x+y;
}

if (y*y < 0.0) result2 = 0.0;
else
{
result2 = y*y+y+x;
}

return (result1 + result2) * 40.0;
}

double func2(double x, double y)
{
double result = 0.0;

if (x*x >= 0.0)
{
result += x*x+x+y;
}

if (y*y >= 0.0)
{
result += y*y+y+x;
}

return result * 40.0;
}


The assembly generated by x86 clang 3.7 with
-O2
switch on gcc.godbolt.org is yet so much different and unexpected. (compilation on gcc results in similar assembly)

.LCPI0_0:
.quad 4630826316843712512 # double 40
func1(double, double): # @func1(double, double)
movapd %xmm0, %xmm2
mulsd %xmm2, %xmm2
addsd %xmm0, %xmm2
addsd %xmm1, %xmm2
movapd %xmm1, %xmm3
mulsd %xmm3, %xmm3
addsd %xmm1, %xmm3
addsd %xmm0, %xmm3
addsd %xmm3, %xmm2
mulsd .LCPI0_0(%rip), %xmm2
movapd %xmm2, %xmm0
retq

.LCPI1_0:
.quad 4630826316843712512 # double 40
func2(double, double): # @func2(double, double)
movapd %xmm0, %xmm2
movapd %xmm2, %xmm4
mulsd %xmm4, %xmm4
xorps %xmm3, %xmm3
ucomisd %xmm3, %xmm4
xorpd %xmm0, %xmm0
jb .LBB1_2
addsd %xmm2, %xmm4
addsd %xmm1, %xmm4
xorpd %xmm0, %xmm0
addsd %xmm4, %xmm0
.LBB1_2:
movapd %xmm1, %xmm4
mulsd %xmm4, %xmm4
ucomisd %xmm3, %xmm4
jb .LBB1_4
addsd %xmm1, %xmm4
addsd %xmm2, %xmm4
addsd %xmm4, %xmm0
.LBB1_4:
mulsd .LCPI1_0(%rip), %xmm0
retq


func1
compiles to a branchless assembly, involving much less instructions than
func2
. thus
func2
is expected to be much slower than
func1
.

Can someone explain this behavior?

Answer

The reason for this behaviour of the comparison operators < or >= differs whether your double is NaN or not a NaN. All comparisons where one of the operands is NaN return false. So your x*x < 0.0 will always be false regardless of whether x is NaN or not. So the compiler can safely optimize this away. However, the case of x * x >= 0 will behave differently for NaN and non-NaN values, thus the compiler leaves the conditional jumps in the assembly.

This is what cppreference says about comparing with NaNs involved:

the values of the operands after conversion are compared in the usual mathematical sense (except that positive and negative zeroes compare equal and any comparison involving a NaN value returns zero)