Mahatma Mahatma - 1 month ago 17
C# Question

Where to call .AsParallel() in a LINQ query

The question



In a LINQ query I can correctly (as in: the compiler won't complain) call .AsParallel() like this:

(from l in list.AsParallel() where <some_clause> select l).ToList();


or like this:

(from l in list where <some_clause> select l).AsParallel().ToList();


what exactly is the difference?

What I've tried



Judging from the official documentation I've almost always seen the first method used so I thought that was the way to go.

Today tho, I've tried to run some benchmark myself and the result was surprising. Here's the code I've run:

var list = new List<int>();
var rand = new Random();
for (int i = 0; i < 100000; i++)
list.Add(rand.Next());

var treshold= 1497234;

var sw = new Stopwatch();

sw.Restart();
var result = (from l in list.AsParallel() where l > treshold select l).ToList();
sw.Stop();

Console.WriteLine($"call .AsParallel() before: {sw.ElapsedMilliseconds}");

sw.Restart();
result = (from l in list where l > treshold select l).AsParallel().ToList();
sw.Stop();

Console.WriteLine($"call .AsParallel() after: {sw.ElapsedMilliseconds}");


Output


call .AsParallel() before: 49

call .AsParallel() after: 4


So, apparently, despite what the documentation says, the second method is much faster. What's exactly happening here?

Answer

The trick to using AsParallel in general is to decide if the savings from parallelism overweight the overhead of doing things in parallel.

When conditions are easy to evaluate, such as yours, the overhead of making multiple parallel streams and collecting their results at the end greatly outweigh the benefit of performing comparisons in parallel.

When conditions are computationally intense, making AsParallel call early speeds things up quite a bit, because the overhead is now small in comparison to the benefit of running multiple Where computations in parallel.

For an example of a computationally hard condition consider a method that decides whether a number is prime or not. Doing this in parallel on a multi-core CPU will show significant improvement over the non-parallelised implementation.

Comments