Pilik - 1 year ago 55

R Question

I want to add many new columns simultaneously to a

`data.table`

`Time Stock x1 x2 x3`

1: 2014-08-22 A 15 27 34

2: 2014-08-23 A 39 44 29

3: 2014-08-24 A 20 50 5

4: 2014-08-22 B 42 22 43

5: 2014-08-23 B 44 45 12

6: 2014-08-24 B 3 21 2

Now I want to

`scale`

`sum`

`Time Stock x1 x2 x3 x2_scale x3_scale x2_sum x3_sum`

1: 2014-08-22 A 15 27 34 -1.1175975 0.7310560 121 68

2: 2014-08-23 A 39 44 29 0.3073393 0.4085313 121 68

3: 2014-08-24 A 20 50 5 0.8102582 -1.1395873 121 68

4: 2014-08-22 B 42 22 43 -0.5401315 1.1226726 88 57

5: 2014-08-23 B 44 45 12 1.1539172 -0.3274462 88 57

6: 2014-08-24 B 3 21 2 -0.6137858 -0.7952265 88 57

A brute force implementation of my problem would be:

`library(data.table)`

set.seed(123)

d <- data.table(Time = rep(seq.Date( Sys.Date(), length=3, by="day" )),

Stock = rep(LETTERS[1:2], each=3 ),

x1 = sample(1:50, 6),

x2 = sample(1:50, 6),

x3 = sample(1:50, 6))

d[,x2_scale:=scale(x2),by=Stock]

d[,x3_scale:=scale(x3),by=Stock]

d[,x2_sum:=sum(x2),by=Stock]

d[,x3_sum:=sum(x3),by=Stock]

Other posts describing a similar issue (Add multiple columns to R data.table in one function call? and assign multiple columns using := in data.table, by group) suggest the following solution:

`d[, c("x2_scale","x3_scale"):=list(scale(x2),scale(x3)), by=Stock]`

d[, c("x2_sum","x3_sum"):=list(sum(x2),sum(x3)), by=Stock]

But again, this would get very messy with a lot of variables and also this brings up an error message with

`scale`

`sum`

Is there a more efficient way to achieve the required result (keeping in mind that my actual data set is quite large)?

Answer Source

I think with a small modification to your last code you can easily do both for as many variables you want

```
vars <- c("x2", "x3") # <- Choose the variable you want to operate on
d[, paste0(vars, "_", "scale") := lapply(.SD, function(x) scale(x)[, 1]), .SDcols = vars, by = Stock]
d[, paste0(vars, "_", "sum") := lapply(.SD, sum), .SDcols = vars, by = Stock]
## Time Stock x1 x2 x3 x2_scale x3_scale x2_sum x3_sum
## 1: 2014-08-22 A 13 14 32 -1.1338934 1.1323092 87 44
## 2: 2014-08-23 A 25 39 9 0.7559289 -0.3701780 87 44
## 3: 2014-08-24 A 18 34 3 0.3779645 -0.7621312 87 44
## 4: 2014-08-22 B 44 8 6 -0.4730162 -0.7258662 59 32
## 5: 2014-08-23 B 49 3 18 -0.6757374 1.1406469 59 32
## 6: 2014-08-24 B 15 48 8 1.1487535 -0.4147807 59 32
```

For simple functions (that don't need special treatment like `scale`

) you could easily do something like

```
vars <- c("x2", "x3") # <- Define the variable you want to operate on
funs <- c("min", "max", "mean", "sum") # <- define your function
for(i in funs){
d[, paste0(vars, "_", i) := lapply(.SD, eval(i)), .SDcols = vars, by = Stock]
}
```