Tagir Valeev Tagir Valeev - 4 months ago 8
Java Question

Why is Arrays.fill() not used in HashMap.clear() anymore?

I noticed something strange in the implementation of

HashMap.clear()
. This is how it looked in OpenJDK 7u40:

public void clear() {
modCount++;
Arrays.fill(table, null);
size = 0;
}


And this is how it looks as of OpenJDK 8u40:

public void clear() {
Node<K,V>[] tab;
modCount++;
if ((tab = table) != null && size > 0) {
size = 0;
for (int i = 0; i < tab.length; ++i)
tab[i] = null;
}
}


I understand that now the
table
can be null for empty an map, thus the additional check and caching in a local variable is required. But why was
Arrays.fill()
replaced with a for-loop?

It seems that the change was introduced in this commit. Unfortunately I found no explanation for why a plain for loop might be better than
Arrays.fill()
. Is it faster? Or safer?

Answer

I will try to summarize three moreless reasonable versions which were proposed in comments.

@Holger says:

I guess that this is to avoid the class java.util.Arrays getting loading as a side effect of this method. For application code, this is usually not a concern.

This is the most easy thing to test. Let's compile such program:

public class HashMapTest {
    public static void main(String[] args) {
        new java.util.HashMap();
    }
}

Run it with java -verbose:class HashMapTest. This will print the class loading events as they occur. With JDK 1.8.0_60 I see more than 400 classes loaded:

... 155 lines skipped ...
[Loaded java.util.Set from C:\Program Files\Java\jre1.8.0_60\lib\rt.jar]
[Loaded java.util.AbstractSet from C:\Program Files\Java\jre1.8.0_60\lib\rt.jar]
[Loaded java.util.Collections$EmptySet from C:\Program Files\Java\jre1.8.0_60\lib\rt.jar]
[Loaded java.util.Collections$EmptyList from C:\Program Files\Java\jre1.8.0_60\lib\rt.jar]
[Loaded java.util.Collections$EmptyMap from C:\Program Files\Java\jre1.8.0_60\lib\rt.jar]
[Loaded java.util.Collections$UnmodifiableCollection from C:\Program Files\Java\jre1.8.0_60\lib\rt.jar]
[Loaded java.util.Collections$UnmodifiableList from C:\Program Files\Java\jre1.8.0_60\lib\rt.jar]
[Loaded java.util.Collections$UnmodifiableRandomAccessList from C:\Program Files\Java\jre1.8.0_60\lib\rt.jar]
[Loaded sun.reflect.Reflection from C:\Program Files\Java\jre1.8.0_60\lib\rt.jar]
**[Loaded java.util.HashMap from C:\Program Files\Java\jre1.8.0_60\lib\rt.jar]
[Loaded java.util.HashMap$Node from C:\Program Files\Java\jre1.8.0_60\lib\rt.jar]
[Loaded java.lang.Class$3 from C:\Program Files\Java\jre1.8.0_60\lib\rt.jar]
[Loaded java.lang.Class$ReflectionData from C:\Program Files\Java\jre1.8.0_60\lib\rt.jar]
[Loaded java.lang.Class$Atomic from C:\Program Files\Java\jre1.8.0_60\lib\rt.jar]
[Loaded sun.reflect.generics.repository.AbstractRepository from C:\Program Files\Java\jre1.8.0_60\lib\rt.jar]
[Loaded sun.reflect.generics.repository.GenericDeclRepository from C:\Program Files\Java\jre1.8.0_60\lib\rt.jar]
[Loaded sun.reflect.generics.repository.ClassRepository from C:\Program Files\Java\jre1.8.0_60\lib\rt.jar]
[Loaded java.lang.Class$AnnotationData from C:\Program Files\Java\jre1.8.0_60\lib\rt.jar]
[Loaded sun.reflect.annotation.AnnotationType from C:\Program Files\Java\jre1.8.0_60\lib\rt.jar]
[Loaded java.util.WeakHashMap from C:\Program Files\Java\jre1.8.0_60\lib\rt.jar]
[Loaded java.lang.ClassValue$ClassValueMap from C:\Program Files\Java\jre1.8.0_60\lib\rt.jar]
[Loaded java.lang.reflect.Modifier from C:\Program Files\Java\jre1.8.0_60\lib\rt.jar]
[Loaded sun.reflect.LangReflectAccess from C:\Program Files\Java\jre1.8.0_60\lib\rt.jar]
[Loaded java.lang.reflect.ReflectAccess from C:\Program Files\Java\jre1.8.0_60\lib\rt.jar]
**[Loaded java.util.Arrays from C:\Program Files\Java\jre1.8.0_60\lib\rt.jar]
...

As you can see, HashMap is loaded long before application code and Arrays is loaded only 14 classes after HashMap. The HashMap load is triggered by sun.reflect.Reflection initialization as it has HashMap static fields. The Arrays load is likely to be triggered by WeakHashMap load which actually has Arrays.fill in the clear() method. The WeakHashMap load is triggered by java.lang.ClassValue$ClassValueMap which extends WeakHashMap. The ClassValueMap is present in every java.lang.Class instance. So to me seems that without Arrays class the JDK cannot be initialized at all. Also the Arrays static initializer is very short, it only initializes the assertion mechanism. This mechanism is used in many other classes (including, for example, java.lang.Throwable which is loaded very early). No other static initialization steps are performed in java.util.Arrays. Thus @Holger version seems incorrect to me.

Here we also found very interesting thing. The WeakHashMap.clear() still uses Arrays.fill. It's interesting when it appeared there, but unfortunately this goes to prehistoric times (it was already there in the very first public OpenJDK repository).

Next, @MarcoTopolnik says:

Safer surely not, but it might be faster when the fill call is not inlined and tab is short. On HotSpot both the loop and the explicit fill call will result in a fast compiler intrinsic (in a happy-day scenario).

It was actually surprising for me that Arrays.fill is not directly intrinsified (see intrinsic list generated by @apangin). Seems that such loop can be recognized and vectorized by JVM without explicit intrinsic handling. So it's true that extra call can be not inlined in very specific cases (for example if MaxInlineLevel limit is reached). On the other hand it's very rare situation and it's only a single call, it's not a call inside loop, and it's a static, not virtual/interface call, thus the performance improvement could be only marginal and only in some specific scenarios. Not the thing the JVM developers usually care.

Also it should be noted that even C1 'client' compiler (tier 1-3) is capable to inline Arrays.fill called, for example, in WeakHashMap.clear(), as inlining log (-XX:+UnlockDiagnosticVMOptions -XX:+PrintCompilation -XX:+PrintInlining) says:

36       3  java.util.WeakHashMap::clear (50 bytes)
     !m        @ 4   java.lang.ref.ReferenceQueue::poll (28 bytes)
                 @ 17   java.lang.ref.ReferenceQueue::reallyPoll (66 bytes)   callee is too large
               @ 28   java.util.Arrays::fill (21 bytes)
     !m        @ 40   java.lang.ref.ReferenceQueue::poll (28 bytes)
                 @ 17   java.lang.ref.ReferenceQueue::reallyPoll (66 bytes)   callee is too large
               @ 1   java.util.AbstractMap::<init> (5 bytes)   inline (hot)
                 @ 1   java.lang.Object::<init> (1 bytes)   inline (hot)
               @ 9   java.lang.ref.ReferenceQueue::<init> (27 bytes)   inline (hot)
                 @ 1   java.lang.Object::<init> (1 bytes)   inline (hot)
                 @ 10   java.lang.ref.ReferenceQueue$Lock::<init> (5 bytes)   unloaded signature classes
               @ 62   java.lang.Float::isNaN (12 bytes)   inline (hot)
               @ 112   java.util.WeakHashMap::newTable (8 bytes)   inline (hot)

Of course, it's also easily inlined by smart and powerful C2 'server' compiler. Thus I see no problems here. Seems that @Marco version is incorrect either.

Finally we have a couple of comments from @StuartMarks (who is JDK developer, thus some official voice):

Interesting. My hunch is that this is a mistake. The review thread for this changeset is here and it references an earlier thread that is continued here. The initial message in that earlier thread points to a prototype of HashMap.java in Doug Lea's CVS repository. I don't know where this came from. It doesn't seem to match anything in the OpenJDK history.

... In any case, it might have been some old snapshot; the for-loop was in the clear() method for many years. The Arrays.fill() call was introduced by this changeset, so it was in the tree only for a few months. Note also that the power-of-two computation based on Integer.highestOneBit() introduced by this changeset also disappeared at the same time, though this was noted but dismissed during the review. Hmmm.

Indeed the HashMap.clear() contained the loop many years, was replaced with Arrays.fill on Apr 10th, 2013 and stayed less one half-a-year until Sept 4th when the discussed commit was introduced. The discussed commit was actually a major rewrite of the HashMap internals to fix JDK-8023463 issue. It was a long story about possibility to poison the HashMap with keys having duplicating hashcodes reducing HashMap search speed to linear making it vulnerable to DoS-attacks. The attempts to solve this were performed in JDK-7 including some randomization of String hashCode. So seems that the HashMap implementation was forked from the earlier commit, developed independently, then merged into the master branch overwriting several changes introduced in-between.

We may support this hypothesis performing a diff. Take the version where Arrays.fill was removed (2013-09-04) and compare it with previous version (2013-07-30). The diff -U0 output has 4341 lines. Now let's diff against the version prior to one when Arrays.fill was added (2013-04-01). Now diff -U0 contains only 2680 lines. Thus the newer version actually more similar to the older than to immediate parent.

Conclusion

So to conclude I would agree with Stuart Marks. There were no concrete reason to remove Arrays.fill, it's just because the in-between change was overwritten by mistake. Using Arrays.fill is perfectly fine both in JDK code and in user applications and used, for example, in WeakHashMap. The Arrays class is loaded anyways pretty early during the JDK initialization, has very simple static initializer and Arrays.fill method can be easily inlined even by client compiler, so no performance drawback should be noted.

Comments