Christoph_J - 10 months ago 47

R Question

I have many rows and on every row I compute the uniroot of a non-linear function. I have a quad-core Ubuntu machine which hasn't stopped running my code for two days now. Not surprisingly, I'm looking for ways to speed things up ;-)

After some research, I noticed that only one core is currently used and parallelization is the thing to do. Digging deeper, I came to the conclusion (maybe incorrectly?) that the package

`foreach`

`multicore`

`pvec`

However, if I understand it correctly, this function only takes

`data.frame`

`mapply`

Here is a small example of what I want to do: (Note that I include a

`plyr`

`mapply`

`foreach`

`library(plyr)`

library(foreach)

n <- 10000

df <- data.frame(P = rnorm(n, mean=100, sd=10),

B0 = rnorm(n, mean=40, sd=5),

CF1 = rnorm(n, mean=30, sd=10),

CF2 = rnorm(n, mean=30, sd=5),

CF3 = rnorm(n, mean=90, sd=8))

get_uniroot <- function(P, B0, CF1, CF2, CF3) {

uniroot(function(x) {-P + B0 + CF1/x + CF2/x^2 + CF3/x^3},

lower = 1,

upper = 10,

tol = 0.00001)$root

}

system.time(x1 <- mapply(get_uniroot, df$P, df$B0, df$CF1, df$CF2, df$CF3))

#user system elapsed

#0.91 0.00 0.90

system.time(x2 <- mdply(df, get_uniroot))

#user system elapsed

#5.85 0.00 5.85

system.time(x3 <- foreach(P=df$P, B0=df$B0, CF1=df$CF1, CF2=df$CF2, CF3=df$CF3, .combine = "c") %do% {

get_uniroot(P, B0, CF1, CF2, CF3)})

#user system elapsed

# 10.30 0.00 10.36

all.equal(x1, x2$V1) #TRUE

all.equal(x1, x3) #TRUE

Also, I tried to implement Ryan Thompson's function chunkapply from the SO link above (only got rid of

`doMC`

but didn't get it to work. However, since it uses

`foreach`

`#chunkapply(get_uniroot, list(P=df$P, B0=df$B0, CF1=df$CF1, CF2=df$CF2, CF3=df$CF3))`

#Error in { : task 1 failed - "invalid function value in 'zeroin'"

PS: I know that I could just increase

`tol`

`tol`

Answer Source

I'd use the `parallel`

package that's built into R 2.14 and work with matrices. You could then simply use `mclapply`

like this:

```
dfm <- as.matrix(df)
result <- mclapply(seq_len(nrow(dfm)),
function(x) do.call(get_uniroot,as.list(dfm[x,])),
mc.cores=4L
)
unlist(result)
```

This is basically doing the same mapply does, but in a parallel way.

**But...**

Mind you that parallelization always counts for some overhead as well. As I explained in the question you link to, going parallel only pays off if your inner function calculates significantly longer than the overhead involved. In your case, your uniroot function works pretty fast. You might then consider to cut your data frame in bigger chunks, and combine both mapply and mclapply. A possible way to do this is:

```
ncores <- 4
id <- floor(
quantile(0:nrow(df),
1-(0:ncores)/ncores
)
)
idm <- embed(id,2)
mapply_uniroot <- function(id){
tmp <- df[(id[1]+1):id[2],]
mapply(get_uniroot, tmp$P, tmp$B0, tmp$CF1, tmp$CF2, tmp$CF3)
}
result <-mclapply(nrow(idm):1,
function(x) mapply_uniroot(idm[x,]),
mc.cores=ncores)
final <- unlist(result)
```

This might need some tweaking, but it essentially breaks your df in exactly as many bits as there are cores, and run the mapply on every core. To show this works :

```
> x1 <- mapply(get_uniroot, df$P, df$B0, df$CF1, df$CF2, df$CF3)
> all.equal(final,x1)
[1] TRUE
```