pythonic pythonic - 6 months ago 33
Scala Question

How can I iterate a map in parallel in scala?

I have a code like the following.

var arr = new Array[...](map.size)
var i = 0
for((k,value) <- map)
arr(i) = (k, someFunc(value))
i += 1

I want this loop to execute in parallel. For example, I want it to run in 8 separate threads in parallel. How can I achieve that in Scala?


You can convert the map into a parallel collection, and control the number of threads by overriding the default "TaskSupport" with a pool of size 8 (or any size you want):

import scala.collection.parallel.ForkJoinTaskSupport
import scala.collection.parallel.immutable.ParMap

val parMap: ParMap[Int, Int] = map.par
parMap.tasksupport = new ForkJoinTaskSupport(new scala.concurrent.forkjoin.ForkJoinPool(8))

parMap.foreach { case (k, value) =>
  arr(i) = (k, someFunc(value))
  i += 1

Do note that you can make this code more "idiomatic" by removing all the mutable values:

val arr = { case (k, value) => (k, someFunc(value)) }.toArray
val i = arr.length

EDIT: or, an even shorter version:

val arr = parMap.mapValues(someFunc).toArray
val i = arr.length