Simon Z. Simon Z. - 2 months ago 9
R Question

Count number of records and generate row number within each group

I have the following data.table

set.seed(1)
DT <- data.table(VAL = sample(c(1, 2, 3), 10, replace = TRUE))
VAL
1: 1
2: 2
3: 2
4: 3
5: 1
6: 3
7: 3
8: 2
9: 2
10: 1


Within each number in
VAL
I want to:


  1. Count the number of records/rows

  2. Create an row index (counter) of first, second, third occurrence et c.



At the end I want the result

VAL COUNT IDX
1: 1 3 1
2: 2 4 1
3: 2 4 2
4: 3 3 1
5: 1 3 2
6: 3 3 2
7: 3 3 3
8: 2 4 3
9: 2 4 4
10: 1 3 3


where "COUNT" is the number of records/rows for each "VAL", and "IDX" is the row index within each "VAL".

I tried to work with
which
and
length
using
.I
:

dt[, list(COUNT = length(VAL == VAL[.I]),
IDX = which(which(VAL == VAL[.I]) == .I))]


but this does not work as
.I
refers to a vector with the index, so I guess one must use
.I[]
. Though inside
.I[]
I again face the problem, that I do not have the row index and I do know (from reading
data.table
FAQ and following the posts here) that looping through rows should be avoided if possible.

So, what's the
data.table
way?

Answer

Using .N...

DT[ , `:=`( COUNT = .N , IDX = 1:.N ) , by = VAL ]
#    VAL COUNT IDX
# 1:   1     3   1
# 2:   2     4   1
# 3:   2     4   2
# 4:   3     3   1
# 5:   1     3   2
# 6:   3     3   2
# 7:   3     3   3
# 8:   2     4   3
# 9:   2     4   4
#10:   1     3   3

.N is the number of records in each group, with groups defined by "VAL".