Simon Z. - 1 year ago 74
R Question

# Count number of records and generate row number within each group

I have the following data.table

``````set.seed(1)
DT <- data.table(VAL = sample(c(1, 2, 3), 10, replace = TRUE))
VAL
1:   1
2:   2
3:   2
4:   3
5:   1
6:   3
7:   3
8:   2
9:   2
10:   1
``````

Within each number in
`VAL`
I want to:

1. Count the number of records/rows

2. Create an row index (counter) of first, second, third occurrence et c.

At the end I want the result

``````    VAL COUNT IDX
1:   1     3   1
2:   2     4   1
3:   2     4   2
4:   3     3   1
5:   1     3   2
6:   3     3   2
7:   3     3   3
8:   2     4   3
9:   2     4   4
10:   1     3   3
``````

where "COUNT" is the number of records/rows for each "VAL", and "IDX" is the row index within each "VAL".

I tried to work with
`which`
and
`length`
using
`.I`
:

`````` dt[, list(COUNT = length(VAL == VAL[.I]),
IDX = which(which(VAL == VAL[.I]) == .I))]
``````

but this does not work as
`.I`
refers to a vector with the index, so I guess one must use
`.I[]`
. Though inside
`.I[]`
I again face the problem, that I do not have the row index and I do know (from reading
`data.table`
FAQ and following the posts here) that looping through rows should be avoided if possible.

So, what's the
`data.table`
way?

Using `.N`...

``````DT[ , `:=`( COUNT = .N , IDX = 1:.N ) , by = VAL ]
#    VAL COUNT IDX
# 1:   1     3   1
# 2:   2     4   1
# 3:   2     4   2
# 4:   3     3   1
# 5:   1     3   2
# 6:   3     3   2
# 7:   3     3   3
# 8:   2     4   3
# 9:   2     4   4
#10:   1     3   3
``````

`.N` is the number of records in each group, with groups defined by `"VAL"`.

Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download