jonsl jonsl - 2 months ago 9
Android Question

Code possibly optimized away in OpenGL ES 2.0 Fragment Shader on iOS

I am writing yet another GPU mandelbrot renderer for iOS, and I have some unexpected results in the fragment shader.

I have 2 uniforms, and if I test their values independently:

if (u_h0 == 0.00130208337) {
return 200.; // this line is executed

comment out the above, and then:

if (u_h1 == -0.0000000000388051084) {
return 100.; // this line is executed

I hope these are valid tests. Now I call a function:

vec2 e_ty = ds_mul(vec2(1., 0.), vec2(0.00130208337, -0.0000000000388051084));
if (e_ty.y == -0.0000000000388051084) {
return 100.; // this line is executed (correct result)

But, the following does not yield the same result:

vec2 e_ty = ds_mul(vec2(1., 0.), vec2(u_h0, u_h1));
if (e_ty.y == -0.0000000000388051084) {
return 100.; // this is NOT executed

Looking a bit further:

vec2 e_ty = ds_mul(vec2(1., 0.), vec2(u_h0, u_h1));
if (e_ty.y == 0.) {//-0.0000000000388051084) {
return 100.; // this IS executed

What can be going on here? I suspect this is some compiler optimization type magic, but I cannot find any pragma-type options (to turn off fast math?) except (if I switch to OpenGL ES 3.0):

#pragma optimize({on, off}) - enable or disable shader optimization (default on)

Which does not solve my problem. I believe there are:

#pragma optionNV(fastmath off)
#pragma optionNV(fastprecision off)

for nVidia, but I cannot find an equivalent for iOS devices.

Does anyone have any ideas? This is driving me nuts..


sorry i meant does anyone have any useful ideas

Yes. Stop trying to equality-compare floating-point numbers. It's almost always a bad idea.

The problem you're having is a direct result of you expecting floating-point comparisons to be exact. They aren't going to be exact. They will never be exact. And there's no setting you can use to make them work.

The specific issue is this:

(u_h1 == -0.0000000000388051084)

This is a comparison of a uniform value with a floating-point literal. The uniform value will be provided by you on the CPU. The literal is also provided by you on the CPU, as interpreted by the GLSL compiler.

If the GLSL compiler uses the same float-parsing algorithm you used to get the float value you provide to the uniform, then odds are good this comparison will work. It's simply doing a floating-point comparison of the data you provided with other data that you also provided.

The key point here is that no GLSL computations will be used.

vec2 e_ty = ds_mul(vec2(1., 0.), vec2(0.00130208337, -0.0000000000388051084));

Assuming that ds_mul is a pure function, this will boil down to a constant expression. Any compiler worth using will execute this function call on the CPU, simply storing the result. And in doing so, it will use the CPU's native floating-point precision and representation.

Indeed, any compiler worth using will realize that e_ty is a constant expression and therefore execution the conditional comparison on the CPU as well.

But either way, the point is the same as before: no GLSL computations will be executed.

vec2 e_ty = ds_mul(vec2(1., 0.), vec2(u_h0, u_h1));

This is an expression based on the value of 2 uniforms. As such, it cannot be optimized away; it must be executed as written on the GPU. Which means you are now at the mercy of the GPU's floating-point precision.

And on this issue, GPUs show no mercy.

Does the GPU permit 32-bit floats? You can use highp and hope for the best. Does the GPU properly handle demoralized IEEE-754 32-bit floats? Odds are good no, and there is absolutely no way for you to force it to do so.

So what is the result of that expression? It will be the result of the math, within the tolerance of the GPU's computation precision. Which you cannot control. Because the GPU used less precision, it computed a value of 0. Which is not equal to the small float value you provided.

Whatever algorithm you're trying to use relies on precise floating-point computations. Such things cannot be controlled in GLSL. Therefore, you must devise an algorithm that is more tolerant of floating-point imprecision.