Tagir Valeev - 5 months ago 10
Java Question

# Why is Arrays.fill() not used in HashMap.clear() anymore?

I noticed something strange in the implementation of

`HashMap.clear()`
. This is how it looked in OpenJDK 7u40:

``````public void clear() {
modCount++;
Arrays.fill(table, null);
size = 0;
}
``````

And this is how it looks as of OpenJDK 8u40:

``````public void clear() {
Node<K,V>[] tab;
modCount++;
if ((tab = table) != null && size > 0) {
size = 0;
for (int i = 0; i < tab.length; ++i)
tab[i] = null;
}
}
``````

I understand that now the
`table`
can be null for empty an map, thus the additional check and caching in a local variable is required. But why was
`Arrays.fill()`
replaced with a for-loop?

It seems that the change was introduced in this commit. Unfortunately I found no explanation for why a plain for loop might be better than
`Arrays.fill()`
. Is it faster? Or safer?

Answer

I will try to summarize three moreless reasonable versions which were proposed in comments.

@Holger says:

I guess that this is to avoid the class java.util.Arrays getting loading as a side effect of this method. For application code, this is usually not a concern.

This is the most easy thing to test. Let's compile such program:

``````public class HashMapTest {
public static void main(String[] args) {
new java.util.HashMap();
}
}
``````

Run it with `java -verbose:class HashMapTest`. This will print the class loading events as they occur. With JDK 1.8.0_60 I see more than 400 classes loaded:

``````... 155 lines skipped ...
[Loaded java.util.Set from C:\Program Files\Java\jre1.8.0_60\lib\rt.jar]
[Loaded java.util.AbstractSet from C:\Program Files\Java\jre1.8.0_60\lib\rt.jar]
[Loaded java.util.Collections\$EmptySet from C:\Program Files\Java\jre1.8.0_60\lib\rt.jar]
[Loaded java.util.Collections\$EmptyList from C:\Program Files\Java\jre1.8.0_60\lib\rt.jar]
[Loaded java.util.Collections\$EmptyMap from C:\Program Files\Java\jre1.8.0_60\lib\rt.jar]
[Loaded java.util.Collections\$UnmodifiableCollection from C:\Program Files\Java\jre1.8.0_60\lib\rt.jar]
[Loaded java.util.Collections\$UnmodifiableList from C:\Program Files\Java\jre1.8.0_60\lib\rt.jar]
[Loaded java.util.Collections\$UnmodifiableRandomAccessList from C:\Program Files\Java\jre1.8.0_60\lib\rt.jar]
[Loaded sun.reflect.Reflection from C:\Program Files\Java\jre1.8.0_60\lib\rt.jar]
**[Loaded java.util.HashMap from C:\Program Files\Java\jre1.8.0_60\lib\rt.jar]
[Loaded java.util.HashMap\$Node from C:\Program Files\Java\jre1.8.0_60\lib\rt.jar]
[Loaded java.lang.Class\$3 from C:\Program Files\Java\jre1.8.0_60\lib\rt.jar]
[Loaded java.lang.Class\$ReflectionData from C:\Program Files\Java\jre1.8.0_60\lib\rt.jar]
[Loaded java.lang.Class\$Atomic from C:\Program Files\Java\jre1.8.0_60\lib\rt.jar]
[Loaded sun.reflect.generics.repository.AbstractRepository from C:\Program Files\Java\jre1.8.0_60\lib\rt.jar]
[Loaded sun.reflect.generics.repository.GenericDeclRepository from C:\Program Files\Java\jre1.8.0_60\lib\rt.jar]
[Loaded sun.reflect.generics.repository.ClassRepository from C:\Program Files\Java\jre1.8.0_60\lib\rt.jar]
[Loaded java.lang.Class\$AnnotationData from C:\Program Files\Java\jre1.8.0_60\lib\rt.jar]
[Loaded sun.reflect.annotation.AnnotationType from C:\Program Files\Java\jre1.8.0_60\lib\rt.jar]
[Loaded java.util.WeakHashMap from C:\Program Files\Java\jre1.8.0_60\lib\rt.jar]
[Loaded java.lang.ClassValue\$ClassValueMap from C:\Program Files\Java\jre1.8.0_60\lib\rt.jar]
[Loaded java.lang.reflect.Modifier from C:\Program Files\Java\jre1.8.0_60\lib\rt.jar]
[Loaded sun.reflect.LangReflectAccess from C:\Program Files\Java\jre1.8.0_60\lib\rt.jar]
[Loaded java.lang.reflect.ReflectAccess from C:\Program Files\Java\jre1.8.0_60\lib\rt.jar]
**[Loaded java.util.Arrays from C:\Program Files\Java\jre1.8.0_60\lib\rt.jar]
...
``````

As you can see, `HashMap` is loaded long before application code and `Arrays` is loaded only 14 classes after `HashMap`. The `HashMap` load is triggered by `sun.reflect.Reflection` initialization as it has `HashMap` static fields. The `Arrays` load is likely to be triggered by `WeakHashMap` load which actually has `Arrays.fill` in the `clear()` method. The `WeakHashMap` load is triggered by `java.lang.ClassValue\$ClassValueMap` which extends `WeakHashMap`. The `ClassValueMap` is present in every `java.lang.Class` instance. So to me seems that without `Arrays` class the JDK cannot be initialized at all. Also the `Arrays` static initializer is very short, it only initializes the assertion mechanism. This mechanism is used in many other classes (including, for example, `java.lang.Throwable` which is loaded very early). No other static initialization steps are performed in `java.util.Arrays`. Thus @Holger version seems incorrect to me.

Here we also found very interesting thing. The `WeakHashMap.clear()` still uses `Arrays.fill`. It's interesting when it appeared there, but unfortunately this goes to prehistoric times (it was already there in the very first public OpenJDK repository).

Next, @MarcoTopolnik says:

Safer surely not, but it might be faster when the `fill` call is not inlined and `tab` is short. On HotSpot both the loop and the explicit `fill` call will result in a fast compiler intrinsic (in a happy-day scenario).

It was actually surprising for me that `Arrays.fill` is not directly intrinsified (see intrinsic list generated by @apangin). Seems that such loop can be recognized and vectorized by JVM without explicit intrinsic handling. So it's true that extra call can be not inlined in very specific cases (for example if `MaxInlineLevel` limit is reached). On the other hand it's very rare situation and it's only a single call, it's not a call inside loop, and it's a static, not virtual/interface call, thus the performance improvement could be only marginal and only in some specific scenarios. Not the thing the JVM developers usually care.

Also it should be noted that even C1 'client' compiler (tier 1-3) is capable to inline `Arrays.fill` called, for example, in `WeakHashMap.clear()`, as inlining log (`-XX:+UnlockDiagnosticVMOptions -XX:+PrintCompilation -XX:+PrintInlining`) says:

``````36       3  java.util.WeakHashMap::clear (50 bytes)
!m        @ 4   java.lang.ref.ReferenceQueue::poll (28 bytes)
@ 17   java.lang.ref.ReferenceQueue::reallyPoll (66 bytes)   callee is too large
@ 28   java.util.Arrays::fill (21 bytes)
!m        @ 40   java.lang.ref.ReferenceQueue::poll (28 bytes)
@ 17   java.lang.ref.ReferenceQueue::reallyPoll (66 bytes)   callee is too large
@ 1   java.util.AbstractMap::<init> (5 bytes)   inline (hot)
@ 1   java.lang.Object::<init> (1 bytes)   inline (hot)
@ 9   java.lang.ref.ReferenceQueue::<init> (27 bytes)   inline (hot)
@ 1   java.lang.Object::<init> (1 bytes)   inline (hot)
@ 10   java.lang.ref.ReferenceQueue\$Lock::<init> (5 bytes)   unloaded signature classes
@ 62   java.lang.Float::isNaN (12 bytes)   inline (hot)
@ 112   java.util.WeakHashMap::newTable (8 bytes)   inline (hot)
``````

Of course, it's also easily inlined by smart and powerful C2 'server' compiler. Thus I see no problems here. Seems that @Marco version is incorrect either.

Finally we have a couple of comments from @StuartMarks (who is JDK developer, thus some official voice):

Interesting. My hunch is that this is a mistake. The review thread for this changeset is here and it references an earlier thread that is continued here. The initial message in that earlier thread points to a prototype of HashMap.java in Doug Lea's CVS repository. I don't know where this came from. It doesn't seem to match anything in the OpenJDK history.

... In any case, it might have been some old snapshot; the for-loop was in the clear() method for many years. The Arrays.fill() call was introduced by this changeset, so it was in the tree only for a few months. Note also that the power-of-two computation based on Integer.highestOneBit() introduced by this changeset also disappeared at the same time, though this was noted but dismissed during the review. Hmmm.

Indeed the `HashMap.clear()` contained the loop many years, was replaced with `Arrays.fill` on Apr 10th, 2013 and stayed less one half-a-year until Sept 4th when the discussed commit was introduced. The discussed commit was actually a major rewrite of the `HashMap` internals to fix JDK-8023463 issue. It was a long story about possibility to poison the `HashMap` with keys having duplicating hashcodes reducing `HashMap` search speed to linear making it vulnerable to DoS-attacks. The attempts to solve this were performed in JDK-7 including some randomization of String hashCode. So seems that the `HashMap` implementation was forked from the earlier commit, developed independently, then merged into the master branch overwriting several changes introduced in-between.

We may support this hypothesis performing a diff. Take the version where `Arrays.fill` was removed (2013-09-04) and compare it with previous version (2013-07-30). The `diff -U0` output has 4341 lines. Now let's diff against the version prior to one when `Arrays.fill` was added (2013-04-01). Now `diff -U0` contains only 2680 lines. Thus the newer version actually more similar to the older than to immediate parent.

Conclusion

So to conclude I would agree with Stuart Marks. There were no concrete reason to remove `Arrays.fill`, it's just because the in-between change was overwritten by mistake. Using `Arrays.fill` is perfectly fine both in JDK code and in user applications and used, for example, in `WeakHashMap`. The `Arrays` class is loaded anyways pretty early during the JDK initialization, has very simple static initializer and `Arrays.fill` method can be easily inlined even by client compiler, so no performance drawback should be noted.