Kamen Stoykov Kamen Stoykov - 1 month ago 18
Node.js Question

Javascript odd performance special in V8

Today I have written few line of JS and I was stunned.... Maybe I miss something but can't figure it out.

Situation is as follow. I have a function which is called twice (or even more times). The first execution of function is faster than following executions.

Code was tested in Node & Chrome (V8 engine)
Firefox is executing the code each time at fixed speed, which is quite slower than V8 Engines. Whatever different execution speed across the engines are not important. The question is about why function first execution is faster than others in V8.

Here is the code: (It can be copy/pasted in chrome console and you will see the result)

var loop = 10000000;

function callable() {
return Math.random();
}

function measureFunction(index) {
var result = 0;
var timer = new Date();
var start = timer.getTime();
for (var i = 0; i < loop; ++i)
result += callable();

res[index] = "RESULT FUNCTION: " + result + " FOR: " + (new Date().getTime() - start);
}

var res = new Array(2);

for (var i = 0 ;i < res.length; ++i)
measureFunction(i);

for (var i = 0; i < res.length; ++i)
console.log(res[i]);

Answer

TL;DR: it's because of the GC. But it's complicated.

I've reproduced your observation with a debug build of V8 version 4.8.0 (candidate) (the one I've had handy) on this slightly modified js:

var loop = 10000000 * 10;

function callable() {
    return Math.random();
}

function measureFunction(index) {
    var result = 0;
    var timer = new Date();
    var start = timer.getTime();
    for (var i = 0;  i < loop;  ++i)
        result += callable();

    res[index] = "RESULT FUNCTION: " + result + " FOR: " + (new Date().getTime() - start) + " ms";
}

var res = new Array(3);

for (var i = 0  ;i < res.length;  ++i) {
    measureFunction(i);
    print (i + " COMPLETE"); // use console.log for node
}

for (var i = 0;  i < res.length;  ++i)
    print(res[i]); // ditto

It gives the following output on my machine:

0 COMPLETE
1 COMPLETE
2 COMPLETE
RESULT FUNCTION: 49997528.61602645 FOR: 649 ms
RESULT FUNCTION: 49996578.63860239 FOR: 1402 ms
RESULT FUNCTION: 49995279.39097646 FOR: 1400 ms

After that I've run the v8 shell with the following options: d8 main.js --trace-opt --trace-deopt --trace-gc

And it gave the following output (abridged):

    ...
[marking 0xaaf9fce2a79 <JS Function measureFunction (SharedFunctionInfo 0xaaf9fce2501)> for recompilation..
[didn't find optimized code in optimized code map for 0xaaf9fce2501 <SharedFunctionInfo measureFunction>]
[compiling method 0xaaf9fce2a79 <JS Function measureFunction (SharedFunctionInfo 0xaaf9fce2501)> using Crankshaft OSR]
[optimizing 0xaaf9fce2a79 <JS Function measureFunction (SharedFunctionInfo 0xaaf9fce2501)> - took 1.092, 3.601, 2.595 ms]
[didn't find optimized code in optimized code map for 0xaaf9fce2501 <SharedFunctionInfo measureFunction>]
[optimizing 0xaaf9fcd2181 <JS Function random (SharedFunctionInfo 0xaaf9fc5c111)> - took 0.445, 2.367, 0.122 ms]
[completed optimizing 0xaaf9fcd2181 <JS Function random (SharedFunctionInfo 0xaaf9fc5c111)>]
[deoptimizing (DEOPT eager): begin 0xaaf9fce2a79 <JS Function measureFunction (SharedFunctionInfo 0xaaf9fce2501)> (opt #3) @14, FP to SP delta: 120]
            ;;; deoptimize at 298: wrong instance type
    ... 
[deoptimizing (eager): ... took 0.099 ms]
Materialization [0x7ffffae625a8] <- 0x25c6de07d439 ;  0x25c6de07d439 <Number: 5.00034e+07>
[removing optimized code for: measureFunction]
[evicting entry from optimizing code map (notify deoptimized) for 0xaaf9fce2501 <SharedFunctionInfo measureFunction> (osr ast id 71)]

0 COMPLETE

[marking 0xaaf9fce2a79 <JS Function measureFunction> for recompilation, reason: small function, ICs with typeinfo: 13/15 (86%)...]
[14386:0x49c5fb0]      657 ms: Scavenge 2.1 (37.1) -> 1.2 (37.1) MB, 1.2 / 0 ms [allocation failure].
[didn't find optimized code in optimized code map for 0xaaf9fce2501 <SharedFunctionInfo measureFunction>]
[compiling method 0xaaf9fce2a79 <JS Function measureFunction (SharedFunctionInfo 0xaaf9fce2501)> using Crankshaft OSR]
[optimizing 0xaaf9fce2a79 <JS Function measureFunction (SharedFunctionInfo 0xaaf9fce2501)> - took 1.232, 5.863, 0.621 ms]
[didn't find optimized code in optimized code map for 0xaaf9fce2501 <SharedFunctionInfo measureFunction>]
[14386:0x49c5fb0]      667 ms: Scavenge 2.1 (37.1) -> 1.2 (37.1) MB, 0.7 / 0 ms [allocation failure].
[14386:0x49c5fb0]      668 ms: Scavenge 2.2 (37.1) -> 1.2 (37.1) MB, 0.4 / 0 ms [allocation failure].
[14386:0x49c5fb0]      669 ms: Scavenge 2.2 (37.1) -> 1.2 (37.1) MB, 0.4 / 0 ms [allocation failure].
[14386:0x49c5fb0]      669 ms: Scavenge 2.2 (37.1) -> 1.2 (37.1) MB, 0.4 / 0 ms [allocation failure].
[14386:0x49c5fb0]      670 ms: Scavenge 2.2 (37.1) -> 1.2 (37.1) MB, 0.4 / 0 ms [allocation failure].
... and so on, 1550 times ...

1 COMPLETE

Same thing (only the scavenger messages) for 2.

The situation changes if I supply, for example, --gc-interval=1 to v8. In this case Scavenge and Mark-sweep GC cycles happen during the first call as well and the output looks like this:

0 COMPLETE
1 COMPLETE
2 COMPLETE
RESULT FUNCTION: 50005046.56689139 FOR: 919 ms
RESULT FUNCTION: 50006871.86618896 FOR: 678 ms
RESULT FUNCTION: 49998279.72474023 FOR: 670 ms

UPD

It turns out eljefedelrodeodeljefe is partially right.

Let's see what happens.

measureFunc starts off unoptimized. It gets hot very quickly so Crankshaft optimizes it and an OSR entry is performed (execution continues where it left off -- from the middle of the loop -- but on the newly optimized version of machine code). But for some reason shortly after that a speculative assumption is broken, causing a bailout (OSR exit to unoptimized, 'full' code).

And V8 doesn't try to recompile measureFunc again before the first call ends. Probably because it already OSRed into optimized function and a speculative assumption failed, so it sees no point in eager retrying (I suppose. I don't know what heuristics are used exactly).

So most of the time the first call of measureFucntion is executed on the Full-compiler tier. And this full code doesn't seem to trigger GC while it's running in the loop. I don't know if it's intentional or a bug. It's a subject to a further investigation.

So yes, the first call execution is (mostly) unoptimized and the subsequent executions are optimized. But V8 doesn't decide to run the code slower. The first execution happens to be faster because the unoptimized code doesn't stop to trigger GC. And it makes real difference in this example, because the code is actually GC-intensive (due to a lot of heap number allocations with short life-time).