laxxy - 7 months ago 44
R Question

# logical operators in data.table with by-groups

I am trying to apply a logical operator to columns of data.table, which works fine without by= groups, but with by= groups something weird is happening. I'm using R 3.3.1 and data.table 1.9.6 (off CRAN).

Here is an example data set:

``````library(data.table)
# x <- matrix(rnorm(10),ncol=2)
x <- structure(c(0.323618333400921, -0.103619160085676, -0.823322722581548,
-0.0175726272468397, 2.17286179204364, 1.63465793636681, 0.258469984285221,
-0.411622451584, 0.22594247704226, 0.114427086028551), .Dim = c(5L, 2L))

dt <- data.table(x)
dt\$id <- c(1,1,1,2,2)
dt

##             V1         V2 id
## 1:  0.32361833  1.6346579  1
## 2: -0.10361916  0.2584700  1
## 3: -0.82332272 -0.4116225  1
## 4: -0.01757263  0.2259425  2
## 5:  2.17286179  0.1144271  2
``````

Let's say I would like to know if cumulative minimum (that is, the minimal value in each column either from the start of sample to current row, or from the start of the by= group to the current row) is below zero. When there are no by= groups, we are good:

``````dt[, cummin(.SD), .SDcols=1:2]
##            V1         V2
## 1:  0.3236183  1.6346579
## 2: -0.1036192  0.2584700
## 3: -0.8233227 -0.4116225
## 4: -0.8233227 -0.4116225
## 5: -0.8233227 -0.4116225

> dt[, cummin(.SD)<0, .SDcols=1:2]
##         V1    V2
## [1,] FALSE FALSE
## [2,]  TRUE FALSE
## [3,]  TRUE  TRUE
## [4,]  TRUE  TRUE
## [5,]  TRUE  TRUE
``````

So far so good. I can also compute actual cumulative minimums with by= groups:

``````dt[, cummin(.SD), by=id]
##    id          V1         V2
## 1:  1  0.32361833  1.6346579
## 2:  1 -0.10361916  0.2584700
## 3:  1 -0.82332272 -0.4116225
## 4:  2 -0.01757263  0.2259425
## 5:  2 -0.01757263  0.1144271
``````

However, if I try to compute the indicator variables as before, the output is as follows:

``````dt[, (cummin(.SD)<0), by=id]
##     id    V1
##  1:  1 FALSE
##  2:  1  TRUE
##  3:  1  TRUE
##  4:  1 FALSE
##  5:  1 FALSE
##  6:  1  TRUE
##  7:  2  TRUE
##  8:  2  TRUE
##  9:  2 FALSE
## 10:  2 FALSE
``````

Instead of 2 variables of 5 obs. each, we have one with 10. It appears that within each by-group, the variables are stacked on top of each other. How should I do this correctly?

This works with `lapply`:
``````dt[,lapply(.SD,function(x) cummin(x)<0),by="id"]