java8.being java8.being - 6 months ago 14
Java Question

How Stream is more efficient?

Please do a favor by commenting about your down vote to this question too.

So, first of all pardon if this seems too naive. I am trying to digest

Stream
package and seems like it's very difficult for me to understand.

I was reading
Stream
package documentation and at a point I tried to implement it to learn by doing. This is the text I have read:


Intermediate operations return a new stream. They are always lazy;
executing an intermediate operation such as filter() does not actually
perform any filtering, but instead creates a new stream that, when
traversed, contains the elements of the initial stream that match the
given predicate. Traversal of the pipeline source does not begin until
the terminal operation of the pipeline is executed.


I understand this much that they provide a new
Stream
, so my first question is, Is creating a stream without traversing a heavy operation?

Now, since intermediate operations are
lazy
and terminal operations are
eager
and also streams are meant to be efficient than old programming standards of
if-else
and more readable.


Processing streams lazily allows for significant efficiencies; in a
pipeline such as the filter-map-sum example above, filtering, mapping,
and summing can be fused into a single pass on the data, with minimal
intermediate state. Laziness also allows avoiding examining all the
data when it is not necessary; for operations such as "find the first
string longer than 1000 characters", it is only necessary to examine
just enough strings to find one that has the desired characteristics
without examining all of the strings available from the source. (This
behavior becomes even more important when the input stream is infinite
and not merely large.)


To demonstrate this, I started implemented a small program to understand the concept. Here is the program:

List<String> stringList = new ArrayList<>();
for (int i = 0; i < 10000; i++) {
stringList.add("String" + i);
}
long start = System.currentTimeMillis();
Stream stream = stringList.stream().filter(s -> s.contains("99"));
long midEnd = System.currentTimeMillis();
System.out.println("Time is millis before applying terminal operation: " + (midEnd - start));
System.out.println(stream.findFirst().get());
long end = System.currentTimeMillis();
System.out.println("Whole time in millis: " + (end - start));
System.out.println("Time in millis for Terminal operation: " + (end - midEnd));

start = System.currentTimeMillis();
for (String ss1 : stringList) {
if (ss1.contains("99")) {
System.out.println(ss1);
break;
}
}
end = System.currentTimeMillis();
System.out.println("Time in millis with old standard: " + (end - start));


I have executed this program many times and each time it has proved me that, creating a new stream from intermediate operations is the heavy task to do. Terminal operations do take very little time as compared to intermediate operations.

And overall, old
if-else
pattern is way more efficient than
streams
. So, again more questions here:


  1. Did I misunderstand something?

  2. If I understand correct, why and when to use streams?

  3. If I am doing or understanding anything wrong, can you please help clarify my concepts Package java.util.stream?



Actual Numbers:

Try 1:

Time is millis before applying terminal operation: 73
String99
Whole time in millis: 76
Time in millis for Terminal operation: 3
String99
Time in millis with old standard: 0


Try 2:

Time is millis before applying terminal operation: 56
String99
Whole time in millis: 59
Time in millis for Terminal operation: 3
String99
Time in millis with old standard: 0


Try 3:

Time is millis before applying terminal operation: 69
String99
Whole time in millis: 72
Time in millis for Terminal operation: 3
String99
Time in millis with old standard: 0


These are my machine details if this help:

Memory: 11.6 GiB
Processor: Intel® Core™ i7-3632QM CPU @ 2.20GHz × 8
OS-Type: 64-bit


Note: I was unable to find answer to this so if it seems duplicated, please do so.

Answer

As others have already have noted your benchmark is flawed. The main problem is that the results are skewed by ignoring compilation time. Try the following:

    Stream stream = stringList.stream().filter(s -> s.contains("99"));
    long   start  = System.currentTimeMillis();
    stream = stringList.stream().filter(s -> s.contains("99"));
    long   midEnd = System.currentTimeMillis();

Now the code that backs filter is already compiled and the second call is fast. Even this would work:

    Stream stream = stringList.stream().map(s -> s);
    long   start  = System.currentTimeMillis();
    stream = stringList.stream().filter(s -> s.contains("99"));
    long   midEnd = System.currentTimeMillis();

map shares most of the code with filter, so calling filter is fast here, too, because the code is already compiled. And in case you ask: Calling filter or map on a different stream would work too, of course.

Your "old style" code doesn't require additional compilation.

Comments