iembry - 11 months ago 84

R Question

I asked a previous question (r - data.table - Why is the data.table result 1 numeric value when it should be rounded to 3 decimal places?) about

`data.table`

`by = cyl`

`dplyr`

How can I obtain the same result (see the

`dplyr`

`data.table`

`# using dplyr`

mtcars1 %>%

group_by(cyl) %>%

select(disp) %>%

mutate(displace = round(disp / sum(disp), digits = 3))

# Adding missing grouping variables: `cyl`

# Source: local data frame [32 x 3]

# Groups: cyl [3]

#

# cyl disp displace

# <dbl> <dbl> <dbl>

# 1 4 108.0 0.093

# 2 4 146.7 0.127

# 3 4 140.8 0.122

# 4 4 78.7 0.068

# 5 4 75.7 0.065

# 6 4 71.1 0.061

# 7 4 120.1 0.104

# 8 4 79.0 0.068

# 9 4 120.3 0.104

# 10 4 95.1 0.082

# # ... with 22 more rows

I have tried this (see the previous post mentioned above):

`# Group cars by number of cylinders and the computed share of displacement`

# using data.table

setkey(mtcars2, "cyl")

mtcars2[ , .(displace = round(disp / sum(disp), digits = 3)), by = list(cyl, disp)]

# cyl disp displace

# 1: 4 108.0 1

# 2: 4 146.7 1

# 3: 4 140.8 1

# 4: 4 78.7 1

# 5: 4 75.7 1

# 6: 4 71.1 1

# 7: 4 120.1 1

# 8: 4 79.0 1

# 9: 4 120.3 1

# 10: 4 95.1 1

# cyl disp displace

This doesn't work here (although it's worked: How to group data.table by multiple columns?)

`mtcars2[ , displace = round(disp / sum(disp), digits = 3), by = list(cyl, disp)]`

# Error in `[.data.table`(mtcars2, , displace = round(disp/sum(disp), digits = 3), :

# unused argument (displace = round(disp/sum(disp), digits = 3))

This doesn't provide all of the columns that I want (as suggested in r - data.table - Why is the data.table result 1 numeric value when it should be rounded to 3 decimal places?):

`mtcars2[ , .(displace = round(disp / sum(disp), digits = 3)), by = cyl]`

Thank you.

Answer Source

When using the `summary`

syntax in `data.table`

, i.e, not using `:=`

, you can include columns in your result by adding the column in the list at the position `j`

:

```
mtcars2[,.(displace = round(disp / sum(disp), digits = 3), disp), by = cyl]
# cyl displace disp
# 1: 6 0.125 160.0
# 2: 6 0.125 160.0
# 3: 6 0.201 258.0
# 4: 6 0.175 225.0
# 5: 6 0.131 167.6
# 6: 6 0.131 167.6
# 7: 6 0.113 145.0
# ...
```