mobileDev07 mobileDev07 - 17 days ago 8
Javascript Question

How are the JS Arrays internally resizing?

So I've been trying to implement a collection type of class (similar to List found in C#) in JS that has some custom functionalities. I also wanted it to be somewhat optimized (I've read some articles on how to properly use JS Arrays).
So I thought to myself "if we don't define an initial size to an Array and we keep adding objects to it, internally it will have to allocate a new size for each insertion, that must be slow. I can avoid this by allocating a new size myself (changing the array length), somewhat similar to how it is done in CSharp, doubling in size whenever the max capacity is reached (I know it's not this trivial but it's a start)"

I tried to implement this idea and found out that it is way slower (~10x slower):

//This simplified approach of my implementation is faster...
var array = [];
var counter = 0;
function addItem(newItem) {
array[++counter] = newItem;
}

//..than this version that resizes the array when a limit is reached
var array = [];
array.length = INITIAL_SIZE;
/*
Alternatively
var array = new Array(INITIAL_SIZE);
*/
var counter = 0;
function addItem(newItem) {
if( CheckCapacity(counter + 1) ) { //Function that checks if the maximum size is reached and if it is, change the array.length to the new size
array[++counter] = newItem;
}
}


Before testing this, I thought to myself, "since I've a new size for the array when I call CheckCapacity(counter + 1), internally it (JS Array) won't have to make as much operations compared to the first function since I make sure that there is space available, more than necessary", i.e., the array[++counter] = newItem line on the second function should be faster compared to the same one in the first function.
I've even used different arrays which contained pre-calculated sizes for the one holding the items; it still was slower.

So back to my question, how is the implementation of JS Array allocating the necessary size? Am I correct to assume that not much can be done to speed this process up? To me it made sense that the of the drawbacks of having an object (the JS Array) that dynamically allocates more memory each time a new item is added, would be the loss of speed (unless it has pretty good algorithms implemented, but I don't know, hence my question).

Answer

In Javascript, an Array is an abstraction. How it is implemented (and when allocation and resizing is performed) is left up to the Javascript engine - the ECMAScript specification does not dictate how this is done. So there is basicallly no precise way to know.

In practice, Javascript engines are very clever about how the allocate memory and the make sure not to allocate too much. In my opinion, they are far more sophisticated than C#'s List -- because Javascript engines can dynamically change the underlying data structure depending on the situation. The algorithms vary, but most will consider whether there are any "holes" in your array:

var array = [];
array[0] = "foo"          // is a resizable array
array[1] = "bar"          // is a resizable array
array[2] = "baz"          // is a resizable array
array[1000000] = "hello"; // is now a hash table
console.log(array[1000000]) // "hello"

If you use arrays normally and use contiguous keys starting at zero, then there are no "holes" and most Javascript engines will represent the Javascript array by using a resizable array data structure. Now consider the fourth assignment, I've created a so-called "hole" of roughly a size of a million (the hole spans slots 3-999999). It turns out, Javascript engines are clever enough not to allocate ~1 million slots in memory for this massive hole. It detects that we have a hole, it will now, represent the Javascript array using a Dictionary / hash-table like data structure (it uses a binary search tree where the keys are hashed) to save space. It won't store space for the hole, just four mappings: (0, "foo"), (1, "bar"), (2, "baz"), (1000000, "hello").

Unfortunately, accessing the Array is now slower for the engine because it will now have to compute a hash and traverse a tree. When there are no holes, we use a resizable array and we have quicker access times, but when we have a hole the Array's performance is slower. The common terminology is to say an Array is a dense array, when it is without any holes (it uses a resizable array = better performance), and an Array is a sparse array, when it with one or more holes (it uses a hash table = slower performance). For best performance in general, try to use dense arrays.

Now to finish off, let me tell you that the following is a bad idea:

var array = new Array(1000000);
array[0] = "foo";               // is a hash table

The array above has a hole of size ~1 million (it's like this: ["foo", undefined, undefined, ... undefined]) and so therefore, it is using a hash-table as the underlying data structure. So implementing the resizing yourself is a bad idea - it will create a hole and cause worst performance than better. You're only confusing the Javascript engine. This is what your code was doing, your array always had a hole in it and therefore was using a hash table as the underlying data structure; giving slower performance compared to an array without any holes (aka the first version of your code).

Am I correct to assume that not much can be done to speed this process up?

Yes, there is little to be done on the user's side regarding pre-allocation of space. To speed up Javascript arrays in general you want to avoid creating sparse arrays (avoid created holes):

  1. Don't pre-allocate using new Array(size). Instead "grow as you go". The engine will work out the size of the underlying resizable array itself.
  2. Use contiguous integer keys starting at 0. Don't start from a big integer. Don't add keys that are not integers (e.g. don't use strings as keys).
  3. Try not to delete keys in the middle of arrays (don't delete the element at index 5 from an array with indices 0-9 filled in).
  4. Don't convert to and from dense and sparse arrays (i.e. don't repeatedly add and remove holes). There's an overhead for the engine to convert to and from the resizable array vs hash-table representations.

The disadvantage of [JS Arrays over C# Lists is that they] dynamically allocate more memory each time a new item is added

No, not necessarily. C# Lists and Javascipt Arrays are basically the same when the Javascript array has no holes. Both are resizable arrays. The difference is that:

  1. C# Lists give the user more control over the behaviour of the resizable array. In Javascript, you have no control over it -- it's inside the engine.
  2. C# Lists allow the user preallocate memory for better performance, whereas in Javascript, you should let the engine automatically work out how to preallocate memory in the underlying resizable array for better performance.