Passing objects via clusterExport or as function arguments

342 views Asked by At

Is it more efficient to pass objects to parallel::parLapply and parallel::parLapplyLB as function arguments or to export them with parallel::clusterExport? I.e.

parallel::parLapply(cl, 1:1000, function(y, x1, x2, x3, x4, x5) {
  ...
}, x1, x2, x3, x4, x5)

or

parallel::clusterExport(cl, c("x1", "x2", "x3", "x4", "x5"))
parallel::parLapply(cl, 1:1000, function(y) {
  ...
})

Non parallel functions e.g. do by default not make copies of the arguments passed to them. They only create copies when the objects are modified. I was wondering, whether the two above mentioned parallel options were differently good at avoiding unnecessary object copies.

1

There are 1 answers

3
Econ_matrix On

For the large data set with both of your versions, I experienced memory management difficulties. I can suggest:

par_func <- function(my_list, x1, x2, x3, x4, x5, ncores){
  # A function to use in the parallel loop
  loop_fun <- function(x){
    # x is i. element in the list
    tryCatch({
      foo(x, x1, x2, x3, x4, x5) # the actual function which would do the work
    }, error = function(err){
      #error_case <- foo2(x, x1, x2, x3, x4, x5) # in case something goes wrong foo2 will deliver something
      error_case <- NULL  # or it can also just return NA or NULL instead of a function's output to prevent error
      return(error_case)
      })
  }
  cl <- parallel::makeCluster(ncores)
  x1 <- x1
  x2 <- x2
  x3 <- x3 
  x4 <- x4 
  x5 <- x5
  out <- parallel::parSapplyLB(cl = cl, 
                               X = my_list,
                               FUN = function(x) loop_fun(x)
                               )
  return(out)
}