Jurat Jurat - 1 year ago 93
R Question

How to efficiently iterate for sum over data.frame objects in the grid?

I have numeric data.frame object in the list and want to add them parallel. However, I observe that adding paired data.frame object in the grid were repeated and I can only iterate over the upper/below triangle of grid and do sum over them parallel. I wrote simple R function for doing this, but my output bit of less efficient because repeated sum happens. I bet there must be more intuitive/efficient way for doing this job. Does anyone have better solution for doing this more easily/efficiently when data.frame objects are in grid? Any suggestion to better formulate my function for this task? Thanks

simulated data

fo <- data.frame( start=seq(1, by=4, len=6), stop=seq(3, by=4, len=6))
ba <- data.frame(start=seq(5, by=2, len=7), stop=seq(7, by=2, len=7))
bleh <- data.frame(start=seq(1, by=5, len=5), stop=seq(3, by=5, len=5))

mylist <- list(fo, ba, bleh)

my custom function

add_pairDF <- function(set, idx=1L) {
quer <- set[[idx]]
.quer <- mapply('+', quer, quer)
supp <- lapply(set[-idx], function(ele_) {
ans <- mapply('+', quer, ele_)
res <- c(list(.quer), supp)

initial output (repetition exist):

ans_1 <- add_pairDF(set=mylist, idx=1L)
ans_2 <- add_pairDF(set=mylist, idx=2L)
ans_3 <- add_pairDF(set=mylist, idx=3L)

desired output:

In my initial output, function does sum over paired data.frame object, but I think I don't need to hit my function three time with different index for mylist.

I want to remove repeated sum over paired data.frame, aim to walk on the below/upper triangle (including diagonal) if all paired data.frame object were placed in the grid square. How can I avoid this sort of repetition? what's the efficient iteration for data.frame object in the grid?Can anyone propose possible ideas to solve my problem?

Answer Source

Here's a way below.

add_df <- function(df1, df2) {
   mapply("+", df1, df2)

# Get all pairs of indices
ndf <- length(mylist)
idx <- expand.grid(1:ndf, 1:ndf)
idx <- idx[idx[,1] <= idx[,2],] 

Map(function(i, j) add_df(mylist[[i]], mylist[[j]]), idx[,1], idx[,2] )

I'm not sure what the intention is when adding data frames with different number of rows. But let's say you want the answer to only add the rows that both data frames have in common, you can replace add_df with:

add_df <- function(df1, df2) {
   nr <- min(nrow(df1), nrow(df2))
   df1[1:nr,] + df2[1:nr,]

EDIT: I replaced mapply with Map to make sure the result is a list in the latter case.

Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download