TheCurlyManLives TheCurlyManLives - 4 days ago 5
R Question

How to find a string in a vector in r?

I have created a function that essentially creates a vector of a 1000 binary values. I have been able to count the longest streak of consecutive 1s by using

rle
.

I was wondering how to find a specific vector (say
c(1,0,0,1)
) in this larger vector? I would want it to return the amount of occurrences of that vector. So
c(1,0,0,1,1,0,0,1)
should return 2, while
c(1,0,0,0,1)
should return 0.

Most solutions that I have found just find whether a sequence occurs at all and return
TRUE
or
FALSE
, or they give results for the individual values, not the specific vector that is specified.

Here's my code so far:

# creates a function where a 1000 people choose either up or down.
updown <- function(){
n = 1000
X = rep(0,n)
Y = rbinom(n, 1, 1 / 2)
X[Y == 1] = "up"
X[Y == 0] = "down"

#calculate the length of the longest streak of ups:
Y1 <- rle(Y)
streaks <- Y1$lengths[Y1$values == c(1)]
max(streaks, na.rm=TRUE)
}

# repeat this process n times to find the average outcome.
longeststring <- replicate(1000, updown())
longeststring(p_vals)

Answer

Since Y is only 0s and 1s, we can paste it into a string and use regex, specifically gregexpr. Simplified a bit:

set.seed(47)    # for reproducibility

Y <- rbinom(1000, 1, 1 / 2)

count_pattern <- function(pattern, x){
    sum(gregexpr(paste(pattern, collapse = ''), 
                 paste(x, collapse = ''))[[1]] > 0)
}

count_pattern(c(1, 0, 0, 1), Y)
## [1] 59

paste reduces the pattern and Y down to strings, e.g. "1001" for the pattern here, and a 1000-character string for Y. gregexpr searches for all occurrences of the pattern in Y and returns the indices of the matches (together with a little more information so they can be extracted, if one wanted). Because gregexpr will return -1 for no match, testing for numbers greater than 0 will let us simply sum the TRUE values to get the number of macthes; in this case, 59.

The other sample cases mentioned:

count_pattern(c(1,0,0,1), c(1,0,0,1,1,0,0,1))
## [1] 2

count_pattern(c(1,0,0,1), c(1,0,0,0,1))
## [1] 0
Comments