user30257 - 9 months ago 59

R Question

I am trying to find the number of consecutive runs of '1' values from a dataframe of over 1M obs. of 11 binary variables. I have looked at a number of similar questions on here, but none deal with lengthy dataframes like mine.

I can find the consecutive runs of '1's individually row-by-row, but I'm looking for a solution that can deal with my entire dataframe a bit more elegantly.

Simple example data:

`test <- data.frame(v1=c(1,0,1),v2=c(1,1,1),v3=c(0,1,1),v4=c(1,1,0),v5=c(1,1,1))`

test

vtest <- as.vector(test[1,])

vtest

r <- rle(vtest)

r$length[r$values ==1]

row1_max <- lapply(r$length[r$values ==1], FUN=max)

row1_max

What's the best way for me to find the max consecutive runs of '1' for each row of my dataframe without having to find each one individually by row?

My real dataset also contains an ID# variable that identifies each record uniquely, and I ultimately want to know the max consecutive runs by ID#, so any additional help there would be much appreciated.

Thanks in advance!

Answer Source

You can use `apply`

to apply a function to each row of your data frame:

```
apply(test, 1, function(x) {
r <- rle(x)
max(r$lengths[as.logical(r$values)])
})
```

This returns the maximum number of consecutive `1`

s per row:

```
[1] 2 4 3
```