Patrick Patrick - 4 months ago 10
Java Question

Why are local variable length for-loops faster? Doesn't branch prediction reduce the effect of lookup times?

So a while back when I was reading up on some Android performance tips when I came by:

Foo[] mArray = ...

public void zero() {
int sum = 0;
for (int i = 0; i < mArray.length; ++i) {
sum += mArray[i].mSplat;

public void one() {
int sum = 0;
Foo[] localArray = mArray;
int len = localArray.length;

for (int i = 0; i < len; ++i) {
sum += localArray[i].mSplat;

So Google says:

is slowest, because the JIT can't yet optimize away the cost of getting the array length once for every iteration through the loop.

is faster. It pulls everything out into local variables, avoiding the lookups. Only the array length offers a performance benefit.

Which made total sense. But after thinking way too much about my computer architecture exam I remembered Branch Predictors:

a branch predictor is a digital circuit that tries to guess which way a branch (e.g. an if-then-else structure) will go before this is known for sure. The purpose of the branch predictor is to improve the flow in the instruction pipeline.

Isn't the computer assuming
i < mArray.length
and compute the loop condition and the body of the loop in parallel (with the exception the very last loop condition check) , effectively removing any performance loses?

I was also thinking about Speculative Execution:

Speculative execution is an optimization technique where a computer system performs some task that may not be actually needed... The objective is to provide more concurrency...

In this case, the computer would be executing the code as if the loop had finished and was still going concurrently at the same time, once again, effectively nullifying any computational costs associated with the condition (since the computer's already performing computations for the future while it computes the condition)?

Essentially what I'm trying to get at is the fact that, even if the condition in
takes a little longer to compute than
, the computer is usually going to compute the correct branch of code while it's waiting to retrieve the answer to the conditional statement anyway, so the performance loss in the lookup to
shouldn't matter (I thought).

Is there something I'm not realizing here?

Sorry about the length of the question.

Thanks in advance.


The site you linked to notes:

zero() is slowest, because the JIT can't yet optimize away the cost of getting the array length once for every iteration through the loop.

I haven't tested on Android, but I'll assume that this is true for now. What this means is that for every iteration of the loop the CPU has to execute code that loads the value of mArray.length from memory. The reason is that the length of the array may change so the compiler can't treat it as a static value.

Whereas in the one() option the programmer explicitly sets the len variable based on knowledge that the array length won't change. Since this is a local variable the compiler can store it in a register rather than loading it from memory in each loop iteration. So this will reduce the number of instructions executed in the loop, and it will make the branch easier to predict.

You are right that branch prediction helps reduce the overhead associated with the loop condition check. But there is still a limit to how much speculation is possible so executing more instructions in each loop iteration can incur additional overhead. Also many mobile processors have less advanced branch predictors and don't support as much speculation.

My guess is that on a modern desktop processor using an advanced Java JIT like HotSpot that you would not see a 3X performance difference. But I don't know for certain, it could be an interesting experiment to try.