Thanh Quang - 8 months ago 62

R Question

i have a question relating to data.table in R.

I am working on an acceleration data that requires me to generate features from the raw data. I want to group data by each 2 second. It is easy by generating 1 more column to indicate groups for each 2 second and group with by.

However, i want do the overlapping windows. For example, my raw data is this

`a=data.table(x = c(1:10), y= c(2:11), z = c(5), second=rep(c(1:5),each=2))`

x y z second

1: 1 2 5 1

2: 2 3 5 1

3: 3 4 5 2

4: 4 5 5 2

5: 5 6 5 3

6: 6 7 5 3

7: 7 8 5 4

8: 8 9 5 4

9: 9 10 5 5

10: 10 11 5 5

Now, i want to calculate the mean of x,y,z column by each 2 seconds. 1and2, 2 and 3, 3 and 4, 4 and 5.

I can run the for loops but since i have a huge dataset, it will take a long time. Do you know how do to it with just data table tools?

Thanks so much

Answer

As there is only 2 unique observations for 'second', we get the `lead`

of the 'x', 'y', 'z' columns, grouped by 'second', `unlist`

the Subset of Data.table and get the `mean`

.

```
nm1 <- c("x", "y", "z")
na.omit(a[, paste0(nm1, 2) := lapply(.SD, function(x) shift(x, 2,
type = "lead")), .SDcols = nm1])[, .(Mean = mean(unlist(.SD))),
.(second = paste0(second, "-", second + 1))]
# second Mean
#1: 1-2 3.666667
#2: 2-3 5.000000
#3: 3-4 6.333333
#4: 4-5 7.666667
```

Or a slightly more compact option would be

```
library(dplyr)
cbind(a[second!= last(second)], a[second!= first(second)])[
,.(Mean = mean(unlist(.SD))), .(second = paste0(second, "-", second+1))]
# second Mean
#1: 1-2 3.666667
#2: 2-3 5.000000
#3: 3-4 6.333333
#4: 4-5 7.666667
```

Or another option would be place them in a `list`

, `rbind`

the dataset, create a new 'id1' column, get the `mean`

after `unlist`

ing the `.SDcols`

or we can get the individual `mean`

of each column

```
dt1 <- rbindlist(list(a[second!= last(second)],
a[second!= first(second)]), idcol=TRUE)[, id1:= as.numeric(gl(.N, 2, .N)), .id][]
```

Get the `mean`

for each column by 'second'

```
dt1[, lapply(.SD, mean), .(second = paste0(id1, "-", id1 + 1)), .SDcols = x:z]
```

Get the whole `mean`

by 'second'

```
dt1[, mean(unlist(.SD)), .(second = paste0(id1, "-", id1 +1)), .SDcols = x:z]
```