sprinter sprinter - 1 month ago 4
Java Question

Why is passing two string arguments more efficient than one list argument

The code below calls two simple functions 10 billion times each.

public class PerfTest {
private long l = 0;

public static void main(String[] args) {
List<String> list = Arrays.asList("a", "b");
long time1 = System.currentTimeMillis();
for (long i = 0; i < 1E10; i++) {
func1("a", "b");
}
long time2 = System.currentTimeMillis();
for (long i = 0; i < 1E10; i++) {
func2(list);
}
System.out.println((time2 - time1) + "/" + (System.currentTimeMillis() - time2));
}

private static void func1(String s1, String s2) { l++; }
private static void func2(List<String> sl) { l++; }
}


My assumption was that the performance of these two calls would be close to identical. If anything I would have guessed that passing two arguments would be slightly slower than passing one. Given all arguments are object references I wasn't expecting the fact that one was a list to make any difference.

I have run the test many times and a typical result is "12781/30536". In other words, the call using two strings takes 13 secs and the call using a list takes 30 secs.

What is the explanation for this difference in performance? Or is this an unfair test? I have tried switching the two calls (in case it was due to startup effects) but the results are the same.

Answer

Actually @Rubins might be close--String isn't a primitive but the compiler should be freely able to rely on the fact that a reference to a given string is a constant.

If you were to call a pure function that didn't rely on any external information (Something I believe Java can figure out), Java could safely short-circuit the entire call if it was only being passed strings.

Let's say the function f is just:

public static String f(String a, String b){return a+b}

then:

str1="a"
str2="b"
str3=str2 // Note that now &str3 == &str2, they reference the same object

x=f(str1, str2) // "ab"
y=f(str1, str3) // "ab"

Java can see this pattern, that any time f is called with (&str1 and &str2) it can just return "ab" and not even execute the call.

str3="c" // Now you are changing &str3 to be != &str2
y=f(str1, str3) // "ac", but since the second address passed in is 
                // different, Java knows not to just return "ab"

In essence, the call to f(str1, str2) is passing 2 "pointers" (kinda), but since they are immutable strings Java can be certain that if f is EVER called with the same two pointers (&str1 and &str2 to steal a little c notation) it can just return the same thing it did last time.

When f(array) is called, however, the call cannot be shortcut because:

public static String f(String[] a){return a[0]+a[1]}

String[] array =new String(){"a","b"};
x=f(array); // will be "ab" for f(&array)
array[1]="c";
y=f(array); // is now "ac" for same f(&array)!  Can't shortcut!

Assuming f is a pure function (that doesn't reference any external variables), the string version cannot fail to assert--the results must be the same for the same address parameters.

For the array, it must call and pass the array or recursively test the contents of the entire array and all it's children every time.

I'm not certain this is what's going on, but I'd bet a few bucks on it.

Immutability is fairly useful and a pretty good habit, there is a good reason Java Strings are immutable.

It might be interesting to pass in two simple mutable objects (with just one non-final field) and see how it performs.

(Sorry if my c/reference terminology is off, I think it's right but I haven't dealt with C in decades. I also realize I'm probably missing a level of dereferencing in Java since objects in memory can move, but I think optimization knows that and the system acts as I've described.)

Comments