datageek - 9 months ago 40

R Question

I have three data.table where each need to be paired one another in square grid. I want to only walk on the pair of upper / below triangle of grid square. I bet this is bit of straightforward in other programming language like java, but I don't know how to make it in R. Does anyone know of any trick of doing this easily?

`mylist <- list(`

a <- data.table(

start=seq(1, by=9, len=10), stop=seq(6, by=9, len=10),

ID=letters[seq(1:10)], score=sample(1:25, 10, replace = FALSE)),

b <- data.table(

start=seq(2, by=11, len=10), stop=seq(8, by=11, len=10),

ID=letters[seq(1:10)], score=sample(1:25, 10, replace = FALSE)),

c <- data.table(

start=seq(4, by=11, len=10), stop=seq(9, by=11, len=10),

ID=letters[seq(1:10)], score=sample(1:25, 10, replace = FALSE))

)

`grid <- matrix((a,a), (a,b), (a,c),`

(b,a), (b,b), (b,c),

(c,a), (c,b), (c,c),3,3)

I couldn't find proper method to create grid object efficiently, so I roughly pin out the grid by manually.

grid object could be matrix or some other representation. This is scratch code where imagine that get.ovlp return grid representation that mentioned above, but overlapped pairs were repeated. my objective is remove these repeated pair by only only walk on upper/below triangle of squre grid

`library(data.table)`

mylist <- list(a,b,c)

get.ovlp <- function(set, idx=1L) {

que <- set[[idx]]

supp <- lapply(set[-idx], function(ele_) {

ans <- data.table::foverlaps(que, ele_)

})

return(supp)

}

get.ovlp function is just toy example that how repeated paired overlap happens like grid object represented.

I only walk on pair for upper/below triangle (including diagonal) from above square grid, then use foverlaps function from data.table package. Can anyone propose possible ideas to solve this problem efficiently? Thanks a lot

Answer

If I understand you correctly you want to apply a function to pairs of elements found in `mylist`

, e.g. `("a", "b")`

. You could for example do this (I use `merge`

as an example for the function):

```
require(data.table)
# your data (I named the elements a, b, and c)
mylist <- list(a = data.table(start=seq(1, by=9, len=10), stop=seq(6, by=9, len=10),
ID=letters[seq(1:10)], score=sample(1:25, 10, replace = FALSE)),
b = data.table(start=seq(2, by=11, len=10), stop=seq(8, by=11, len=10),
ID=letters[seq(1:10)], score=sample(1:25, 10, replace = FALSE)),
c = data.table(start=seq(4, by=11, len=10), stop=seq(9, by=11, len=10),
ID=letters[seq(1:10)], score=sample(1:25, 10, replace = FALSE)))
# build pairs on upper triangle
# utilise fact that >= is meaningful for characters
dt_idx = CJ(i = names(mylist), j = names(mylist))[j >= i]
# apply function (here merge) by i, j:
dt_idx[,
j = merge(x = mylist[[i]], y = mylist[[j]], by = c('start', 'stop', 'ID')),
by = list(i, j)]
```

**Note:**
In case the `>=`

operation on the list names isn't "meaningful" anymore (because the names are not ordered and/or more complicated) you can always use an `integer`

index, then apply the same logic...

```
dt_idx = CJ(i = seq.int(length(mylist)), j = seq.int(length(mylist)))[j >= i]
```