User2321 - 8 days ago 4x
R Question

# Data table add filtered count in j

I have the following data table in R:

``````set.seed(5)
my_data <- data.table(cat_1=c(1,1,1,2,2,1,1,1,3,4,5,4,5),
cat_2 = sample(c("A","B"), 13, replace = T),
ao = rnorm(13,500,10))
``````

And I would like to know the number of rows per cat_1, the sum of ao per cat_1, and the number of As in cat_2 per cat_1. So ideally I would like to get this:

``````merge(my_data[, .(cat1_lines = .N, total_ao = sum(ao, na.rm = T)), by = cat_1],
my_data[cat_2 == "A", .(A_lines = .N), by = cat_1], by = "cat_1", all.x = T)

cat_1 cat1_lines  total_ao A_lines
1:     1          6 3015.5034       1
2:     2          2 1015.8838       2
3:     3          1  516.9518      NA
4:     4          2  984.0768       2
5:     5          2  983.8361       2
``````

Is there a way of doing this in the same by statement without having to merge? Something like (I know this does not work):

``````my_data[, .(cat1_lines = .N, A_lines = .N[cat_2 == "A"],
total_ao = sum(ao, na.rm = T)), by = cat_1]
``````

You can easily do this with a `by` statement in your `data.table`. Try this:

``````my_data[,.(cat1_lines=.N,total_ao=sum(ao),A_lines=sum(cat_2=="A")),by=.(cat_1)]

cat_1 cat1_lines  total_ao A_lines
1:     1          6 3015.5034       1
2:     2          2 1015.8838       2
3:     3          1  516.9518       0
4:     4          2  984.0768       2
5:     5          2  983.8361       2
``````
Source (Stackoverflow)