Dong - 1 year ago 99
R Question

# count the length of Number Sequences

Sample data containing some arithmetic sequences c(4,5,6) and c(10,11).

`````` df <- data.frame(x = c(2, 4, 5, 6, 8, 10, 11))
``````

What I want it is a new column that count the length of the each sequence, such as

``````> df
x cnt
1  2   1
2  4   1
3  5   2
4  6   3
5  8   1
6 10   1
7 11   2
``````

It would be simple to first assign
`df\$cnt[1] = 1`
, then for the second row and beyond just increment the count, or reset to
`1`
depending on if the consecutive numbers in df\$x meet certain criteria (here
`x[i] - x[i-1] == 1`
). I am just not sure loop is the way to go in
`R`
-- also I need to deal with groups.

I can create new column to check if it is in a sequence. From there, I probably can use
`rle`
to calculate the run length and generate the
`cnt`
column (not sure how to do it with the
`NA`
).

``````> df %>% mutate(check=(x-lag(x)==1))
x check
1  2    NA
2  4 FALSE
3  5  TRUE
4  6  TRUE
5  8 FALSE
6 10 FALSE
7 11  TRUE
``````

Is this the way to go? Please suggest solutions with
`dplyr`
or
`data.table`
?

dplyr. Set the `default` value and it will work:

``````df %>% mutate(check = x - lag(x, default = x[1L]) != 1) %>%
group_by(g = cumsum(check)) %>%
mutate(cnt = row_number()) %>%
ungroup %>% select(-g,-check)

x   cnt
<dbl> <int>
1     2     1
2     4     1
3     5     2
4     6     3
5     8     1
6    10     1
7    11     2
``````

data.table. Along the same lines and more concisely:

``````library(data.table)
setDT(df)

df[, cnt := 1:.N, by=cumsum(x != shift(x, fill=x[1L]) + 1L)]

x cnt
1:  2   1
2:  4   1
3:  5   2
4:  6   3
5:  8   1
6: 10   1
7: 11   2
``````

`shift` is data.table's analogue to `lag`.

Alternately, from v1.9.7 of the package on, you're able to use `rowid` instead:

``````df[, cnt := rowid(cumsum(x != shift(x, fill=x[1L]) + 1L))]
``````
Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download