Dong Dong - 1 month ago 9
R Question

count the length of Number Sequences

Sample data containing some arithmetic sequences c(4,5,6) and c(10,11).

df <- data.frame(x = c(2, 4, 5, 6, 8, 10, 11))


What I want it is a new column that count the length of the each sequence, such as

> df
x cnt
1 2 1
2 4 1
3 5 2
4 6 3
5 8 1
6 10 1
7 11 2


It would be simple to first assign
df$cnt[1] = 1
, then for the second row and beyond just increment the count, or reset to
1
depending on if the consecutive numbers in df$x meet certain criteria (here
x[i] - x[i-1] == 1
). I am just not sure loop is the way to go in
R
-- also I need to deal with groups.

I can create new column to check if it is in a sequence. From there, I probably can use
rle
to calculate the run length and generate the
cnt
column (not sure how to do it with the
NA
).

> df %>% mutate(check=(x-lag(x)==1))
x check
1 2 NA
2 4 FALSE
3 5 TRUE
4 6 TRUE
5 8 FALSE
6 10 FALSE
7 11 TRUE


Is this the way to go? Please suggest solutions with
dplyr
or
data.table
?

Answer

dplyr. Set the default value and it will work:

df %>% mutate(check = x - lag(x, default = x[1L]) != 1) %>%
  group_by(g = cumsum(check)) %>% 
  mutate(cnt = row_number()) %>%
  ungroup %>% select(-g,-check)

      x   cnt
  <dbl> <int>
1     2     1
2     4     1
3     5     2
4     6     3
5     8     1
6    10     1
7    11     2

data.table. Along the same lines and more concisely:

library(data.table)
setDT(df)

df[, cnt := 1:.N, by=cumsum(x != shift(x, fill=x[1L]) + 1L)]

    x cnt
1:  2   1
2:  4   1
3:  5   2
4:  6   3
5:  8   1
6: 10   1
7: 11   2

shift is data.table's analogue to lag.

Alternately, from v1.9.7 of the package on, you're able to use rowid instead:

df[, cnt := rowid(cumsum(x != shift(x, fill=x[1L]) + 1L))]
Comments