BayerSe BayerSe - 1 month ago 14
R Question

R: fill values in data.frame in parallel mode possible?

I need to run simulations with a variety of options.
My preferred approach would be to create a setup

data.frame
that I fill during the simulation:

setup <- expand.grid(
option_1 = c(1, 2),
option_2 = c(1, 2),
value = NA
)


Then,

for (idx in 1:nrow(setup)) {
fit <- fun(a=setup$option_1[idx], b=setup$option_2[idx])
setup$value[idx] <- fit$value
}


Is it possible to parallelize the loop part? I unsuccessfully tried
foreach
. Are there other possibilities?

This is an exemplary function:

fun <- function (a, b) {
list(value = a * b)
}

Answer

Using this data.frame...

setup <- expand.grid(
  option_1 = c(1, 2),
  option_2 = c(1, 2),
  value    = NA
)

... let's create a copy of setup to compare the reults:

setup1 <- setup

Then we apply the for loop as you proposed with fun():

for (idx in 1:nrow(setup1)) {
  fit <- fun(a = setup1$option_1[idx], b = setup1$option_2[idx])
  setup1$value[idx] <- fit$value
}

And this is one solution to use a parallel foreach loop:

library(foreach)
library(doSNOW)

cl <- makeCluster(3, "SOCK") # where 3 is the number of cores used
registerDoSNOW(cl)

setup$value <- foreach(idx = 1:nrow(setup), .combine = c, .inorder = TRUE) %dopar% {
  fit <- fun(setup$option_1[idx], setup$option_2[idx])
  fit$value
}

Note: it is important to use the option .inorder=TRUE. Otherwise the results might be not assigned to the right row of your setup data.frame.

Let's check if the two results are identical:

identical(setup, setup1)
# [1] TRUE

The result looks like this:

setup1
#   option_1 option_2 value
# 1        1        1     1
# 2        2        1     2
# 3        1        2     2
# 4        2        2     4