iembry iembry - 28 days ago 9
R Question

r - data.table - Group data.table result by multiple columns with rounding

I asked a previous question (r - data.table - Why is the data.table result 1 numeric value when it should be rounded to 3 decimal places?) about

data.table
and the displaying of a numeric result. The comments suggest that I just use
by = cyl
, but that won't give me the
dplyr
result. Thus, I'm asking a new question here.

How can I obtain the same result (see the
dplyr
code below) with
data.table
?

# using dplyr
mtcars1 %>%
group_by(cyl) %>%
select(disp) %>%
mutate(displace = round(disp / sum(disp), digits = 3))

# Adding missing grouping variables: `cyl`
# Source: local data frame [32 x 3]
# Groups: cyl [3]
#
# cyl disp displace
# <dbl> <dbl> <dbl>
# 1 4 108.0 0.093
# 2 4 146.7 0.127
# 3 4 140.8 0.122
# 4 4 78.7 0.068
# 5 4 75.7 0.065
# 6 4 71.1 0.061
# 7 4 120.1 0.104
# 8 4 79.0 0.068
# 9 4 120.3 0.104
# 10 4 95.1 0.082
# # ... with 22 more rows


I have tried this (see the previous post mentioned above):

# Group cars by number of cylinders and the computed share of displacement
# using data.table
setkey(mtcars2, "cyl")
mtcars2[ , .(displace = round(disp / sum(disp), digits = 3)), by = list(cyl, disp)]

# cyl disp displace
# 1: 4 108.0 1
# 2: 4 146.7 1
# 3: 4 140.8 1
# 4: 4 78.7 1
# 5: 4 75.7 1
# 6: 4 71.1 1
# 7: 4 120.1 1
# 8: 4 79.0 1
# 9: 4 120.3 1
# 10: 4 95.1 1
# cyl disp displace


This doesn't work here (although it's worked: How to group data.table by multiple columns?)

mtcars2[ , displace = round(disp / sum(disp), digits = 3), by = list(cyl, disp)]

# Error in `[.data.table`(mtcars2, , displace = round(disp/sum(disp), digits = 3), :
# unused argument (displace = round(disp/sum(disp), digits = 3))


This doesn't provide all of the columns that I want (as suggested in r - data.table - Why is the data.table result 1 numeric value when it should be rounded to 3 decimal places?):

mtcars2[ , .(displace = round(disp / sum(disp), digits = 3)), by = cyl]


Thank you.

Answer

When using the summary syntax in data.table, i.e, not using :=, you can include columns in your result by adding the column in the list at the position j:

mtcars2[,.(displace = round(disp / sum(disp), digits = 3), disp), by = cyl]

#    cyl displace  disp
# 1:   6    0.125 160.0
# 2:   6    0.125 160.0
# 3:   6    0.201 258.0
# 4:   6    0.175 225.0
# 5:   6    0.131 167.6
# 6:   6    0.131 167.6
# 7:   6    0.113 145.0
# ...