User2321 User2321 - 1 month ago 6
R Question

Data table add filtered count in j

I have the following data table in R:

set.seed(5)
my_data <- data.table(cat_1=c(1,1,1,2,2,1,1,1,3,4,5,4,5),
cat_2 = sample(c("A","B"), 13, replace = T),
ao = rnorm(13,500,10))


And I would like to know the number of rows per cat_1, the sum of ao per cat_1, and the number of As in cat_2 per cat_1. So ideally I would like to get this:

merge(my_data[, .(cat1_lines = .N, total_ao = sum(ao, na.rm = T)), by = cat_1],
my_data[cat_2 == "A", .(A_lines = .N), by = cat_1], by = "cat_1", all.x = T)

cat_1 cat1_lines total_ao A_lines
1: 1 6 3015.5034 1
2: 2 2 1015.8838 2
3: 3 1 516.9518 NA
4: 4 2 984.0768 2
5: 5 2 983.8361 2


Is there a way of doing this in the same by statement without having to merge? Something like (I know this does not work):

my_data[, .(cat1_lines = .N, A_lines = .N[cat_2 == "A"],
total_ao = sum(ao, na.rm = T)), by = cat_1]

Answer

You can easily do this with a by statement in your data.table. Try this:

my_data[,.(cat1_lines=.N,total_ao=sum(ao),A_lines=sum(cat_2=="A")),by=.(cat_1)]

   cat_1 cat1_lines  total_ao A_lines
1:     1          6 3015.5034       1
2:     2          2 1015.8838       2
3:     3          1  516.9518       0
4:     4          2  984.0768       2
5:     5          2  983.8361       2