datageek datageek - 4 months ago 9
R Question

How to get paired data.table of upper or below triangle in grid square?

I have three data.table where each need to be paired one another in square grid. I want to only walk on the pair of upper / below triangle of grid square. I bet this is bit of straightforward in other programming language like java, but I don't know how to make it in R. Does anyone know of any trick of doing this easily?

data



mylist <- list(
a <- data.table(
start=seq(1, by=9, len=10), stop=seq(6, by=9, len=10),
ID=letters[seq(1:10)], score=sample(1:25, 10, replace = FALSE)),
b <- data.table(
start=seq(2, by=11, len=10), stop=seq(8, by=11, len=10),
ID=letters[seq(1:10)], score=sample(1:25, 10, replace = FALSE)),
c <- data.table(
start=seq(4, by=11, len=10), stop=seq(9, by=11, len=10),
ID=letters[seq(1:10)], score=sample(1:25, 10, replace = FALSE))
)


all possible pair in square grid (I did manually):



grid <- matrix((a,a), (a,b), (a,c),
(b,a), (b,b), (b,c),
(c,a), (c,b), (c,c),3,3)


I couldn't find proper method to create grid object efficiently, so I roughly pin out the grid by manually.

desired output:



grid object could be matrix or some other representation. This is scratch code where imagine that get.ovlp return grid representation that mentioned above, but overlapped pairs were repeated. my objective is remove these repeated pair by only only walk on upper/below triangle of squre grid

library(data.table)
mylist <- list(a,b,c)
get.ovlp <- function(set, idx=1L) {
que <- set[[idx]]
supp <- lapply(set[-idx], function(ele_) {
ans <- data.table::foverlaps(que, ele_)
})
return(supp)
}


get.ovlp function is just toy example that how repeated paired overlap happens like grid object represented.

I only walk on pair for upper/below triangle (including diagonal) from above square grid, then use foverlaps function from data.table package. Can anyone propose possible ideas to solve this problem efficiently? Thanks a lot

Answer

If I understand you correctly you want to apply a function to pairs of elements found in mylist, e.g. ("a", "b"). You could for example do this (I use merge as an example for the function):

require(data.table)

# your data (I named the elements a, b, and c)
mylist <- list(a = data.table(start=seq(1, by=9, len=10), stop=seq(6, by=9, len=10),
                              ID=letters[seq(1:10)], score=sample(1:25, 10, replace = FALSE)),
               b = data.table(start=seq(2, by=11, len=10), stop=seq(8, by=11, len=10),
                              ID=letters[seq(1:10)], score=sample(1:25, 10, replace = FALSE)),
               c = data.table(start=seq(4, by=11, len=10), stop=seq(9, by=11, len=10),
                              ID=letters[seq(1:10)], score=sample(1:25, 10, replace = FALSE)))


# build pairs on upper triangle
# utilise fact that >= is meaningful for characters
dt_idx = CJ(i = names(mylist), j = names(mylist))[j >= i]

# apply function (here merge) by i, j:
dt_idx[,
       j = merge(x = mylist[[i]], y = mylist[[j]], by = c('start', 'stop', 'ID')),
       by = list(i, j)]

Note: In case the >= operation on the list names isn't "meaningful" anymore (because the names are not ordered and/or more complicated) you can always use an integer index, then apply the same logic...

dt_idx = CJ(i = seq.int(length(mylist)), j = seq.int(length(mylist)))[j >= i]
Comments