lserlohn lserlohn - 22 days ago 8
R Question

how to use shared dataframe in parallel processing using foreach

I want to use foreach package to parallel the for loop:

the original code looks like:

data_df=data.frame(...) # the data frame where original data stored
result_df=data.frame(...) # the data frame where result data to be stored

for(i in 1:10)
{
a=data_df[i,]$a
b=data_df[i,]$b
sum_result=a+b
sub_result=a-b
result_df[i,]$sum_result=sum_result
result_df[i,]$sub_result=sub_result
}


I used index i as the row number, to get data from data frame and store data back to another data frame.

However, if I change:

for(i in 1:10)


to

foreach( i=1:10) %dopar%


It does run super fast, but the result seems only stored in one column in the data frame. How can I save two columns together?

How should I write the shared data frame, in order to be paralleled?

sample data for data_df

a b
1 1
2 4
4 8
9 6
2 3

Answer

you should use .combine = rbind

result = foreach(i = 1:5, .combine = rbind) %dopar% {
  data.frame(x = runif(40), i = i)
}

> head(result)
          x i
1 0.2777559 1
2 0.2126995 1
3 0.2847905 1
4 0.8950941 1
5 0.4462353 1
6 0.7799849 1
Comments