Erwin Bolwidt Erwin Bolwidt - 3 months ago 8
Java Question

Does Java strictfp modifier have any effect on modern CPUs?

I know the meaning of the

strictfp
modifier on methods (and on classes), according to the JLS:

JLS 8.4.3.5, strictfp methods:


The effect of the strictfp modifier is to make all float or double
expressions within the method body be explicitly FP-strict (§15.4).


JLS 15.4 FP-strict expressions:


Within an FP-strict expression, all intermediate values must be
elements of the float value set or the double value set, implying that
the results of all FP-strict expressions must be those predicted by
IEEE 754 arithmetic on operands represented using single and double
formats

Within an expression that is not FP-strict, some leeway is granted for
an implementation to use an extended exponent range to represent
intermediate results
; the net effect, roughly speaking, is that a
calculation might produce "the correct answer" in situations where
exclusive use of the float value set or double value set might result
in overflow or underflow.


I've been trying to come up with a way to get an actual difference between an expression in a
strictfp
method and one that is not
strictfp
. I've tried this on two laptops, one with a Intel Core i3 CPU and one with an Intel Core i7 CPU. And I can't get any difference.

A lot of posts suggest that native floating point, not using
strictfp
, could be using 80-bit floating point numbers, and have extra representable numbers below the smallest possible java double (closest to zero) or above the highest possible 64-bit java double.

I tried this code below with and without a
strictfp
modifier and it gives exactly the same results.

public static strictfp void withStrictFp() {
double v = Double.MAX_VALUE;
System.out.println(v * 1.0000001 / 1.0000001);
v = Double.MIN_VALUE;
System.out.println(v / 2 * 2);
}


Actually, I assume that any difference would only show up when the code is compiled to assembly so I am running it with the
-Xcomp
JVM argument. But no difference.

I found another post explaining how you can get the assembly code generated by HotSpot. I'm running my code with
java -Xcomp -XX:+UnlockDiagnosticVMOptions -XX:+PrintAssembly
.
The first expression (
v * 1.0000001 / 1.0000001
) with the
strictfp
modifier, and also the same without it, is compiled to:

0x000000010f10a0a9: movsd -0xb1(%rip),%xmm0 # 0x000000010f10a000
; {section_word}
0x000000010f10a0b1: mulsd -0xb1(%rip),%xmm0 # 0x000000010f10a008
; {section_word}
0x000000010f10a0b9: divsd -0xb1(%rip),%xmm0 # 0x000000010f10a010
; {section_word}


There is nothing in that code that truncates the result of each step to 64 bits like I had expected. Looking up the documentation of
movsd
,
mulsd
and
divsd
, they all mention that these (SSE) instructions operate on 64-bit floating point values, not 80-bit values as I expected. So it seems logical that the double value-set that these instructions operate on is already the IEEE 754 value set, so there would be no difference between having
strictfp
and not having it.

My questions are:


  1. Is this analysis correct? I don't use Intel assembly very often so I'm not confident of my conclusion.

  2. Is there any (other) modern CPU architecture (that has a JVM) for which there is a difference between operation with and without the
    strictfp
    modifier?


Answer

If by “modern” you mean processors supporting the sort of SSE2 instructions that you quote in your question as produced by your compiler (mulsd, …), then the answer is no, strictfp does not make a difference, because the instruction set does not allow to take advantage of the absence of strictfp. The available instructions are already optimal to compute to the precise specifications of strictfp. In other words, on that kind of modern CPU, you get strictfp semantics all the time for the same price.

If by “modern” you mean the historical 387 FPU, then it is possible to observe a difference if an intermediate computation would overflow or underflow in strictfp mode (the difference being that it may not overflow or, on underflow, keep more precision bits than expected).

A typical strictfp computation compiled for the 387 will look like the assembly in this answer, with well-placed multiplications by well-chosen powers of two to make underflow behave the same as in IEEE 754 binary64. A round-trip of the result through a 64-bit memory location then takes care of overflows.

The same computation compiled without strictfp would produce one 387 instruction per basic operation, for instance just the multiplication instruction fmulp for a source-level multiplication. (The 387 would have been configured to use the same significand width as binary64, 53 bits, at the beginning of the program.)

Comments