user30257 user30257 - 1 month ago 6
R Question

In R: help using rle() function in dataframe

I am trying to find the number of consecutive runs of '1' values from a dataframe of over 1M obs. of 11 binary variables. I have looked at a number of similar questions on here, but none deal with lengthy dataframes like mine.

I can find the consecutive runs of '1's individually row-by-row, but I'm looking for a solution that can deal with my entire dataframe a bit more elegantly.

Simple example data:

test <- data.frame(v1=c(1,0,1),v2=c(1,1,1),v3=c(0,1,1),v4=c(1,1,0),v5=c(1,1,1))
test
vtest <- as.vector(test[1,])
vtest

r <- rle(vtest)
r$length[r$values ==1]
row1_max <- lapply(r$length[r$values ==1], FUN=max)
row1_max


What's the best way for me to find the max consecutive runs of '1' for each row of my dataframe without having to find each one individually by row?

My real dataset also contains an ID# variable that identifies each record uniquely, and I ultimately want to know the max consecutive runs by ID#, so any additional help there would be much appreciated.

Thanks in advance!

Answer

You can use apply to apply a function to each row of your data frame:

apply(test, 1, function(x) {
  r <- rle(x)
  max(r$lengths[as.logical(r$values)])
})

This returns the maximum number of consecutive 1s per row:

[1] 2 4 3
Comments