Peter Dutton Peter Dutton - 1 month ago 23
R Question

R parallel programming with globally defines S4 classes

I don't understand how to pass a globally defined class to a cluster generated using the parallel package. I have it working for a function:

funs = "testClass"
fun = function(x) testClass(test = x^2)
testClass = function(test) return(test)

cl <- parallel::makeCluster(2, outfile='')
parallel::clusterExport(cl = cl, varlist = funs, envir = globalenv())
res <- parallel::parLapply(cl = cl, X = seq_len(10L), fun = fun)
parallel::stopCluster(cl)
res


The same approach does not work for a class:

funs = "testClass"
fun = function(x) testClass(test = x^2)
testClass = setClass("testClass", slots = c(test = "numeric"))

cl <- parallel::makeCluster(2, outfile='')
parallel::clusterExport(cl = cl, varlist = funs, envir = globalenv())
res <- parallel::parLapply(cl = cl, X = seq_len(10L), fun = fun)
parallel::stopCluster(cl)


I know it is possible to put the class and generator function in a package but is there a simpler solution to this problem?

Answer

Defining an S4 class actually modifies some hidden metadata objects in your global environment. It's not enough just to copy the generator function to your slave nodes; you have to execute the class definition statement on each node. (Well, you could copy those metadata objects over, but that's just asking for trouble.)

cl <- parallel::makeCluster(2, outfile='')
parallel::clusterEvalQ(cl, expr={
    testClass <- setClass("testClass", slots = c(test = "numeric"))
})
res <- parallel::parLapply(cl = cl, X = seq_len(10L), fun = fun)

res

# [[1]]
# An object of class "testClass"
# Slot "test":
# [1] 1
#
# [[2]]
# An object of class "testClass"
# Slot "test":
# [1] 4
# . . .