TheCurlyManLives - 11 months ago 56

R Question

I have created a function that essentially creates a vector of a 1000 binary values. I have been able to count the longest streak of consecutive 1s by using

`rle`

I was wondering how to find a specific vector (say

`c(1,0,0,1)`

`c(1,0,0,1,1,0,0,1)`

`c(1,0,0,0,1)`

Most solutions that I have found just find whether a sequence occurs at all and return

`TRUE`

`FALSE`

Here's my code so far:

`# creates a function where a 1000 people choose either up or down.`

updown <- function(){

n = 1000

X = rep(0,n)

Y = rbinom(n, 1, 1 / 2)

X[Y == 1] = "up"

X[Y == 0] = "down"

#calculate the length of the longest streak of ups:

Y1 <- rle(Y)

streaks <- Y1$lengths[Y1$values == c(1)]

max(streaks, na.rm=TRUE)

}

# repeat this process n times to find the average outcome.

longeststring <- replicate(1000, updown())

longeststring(p_vals)

Answer Source

Since `Y`

is only `0`

s and `1`

s, we can `paste`

it into a string and use regex, specifically `gregexpr`

. Simplified a bit:

```
set.seed(47) # for reproducibility
Y <- rbinom(1000, 1, 1 / 2)
count_pattern <- function(pattern, x){
sum(gregexpr(paste(pattern, collapse = ''),
paste(x, collapse = ''))[[1]] > 0)
}
count_pattern(c(1, 0, 0, 1), Y)
## [1] 59
```

`paste`

reduces the pattern and `Y`

down to strings, e.g. `"1001"`

for the pattern here, and a 1000-character string for `Y`

. `gregexpr`

searches for all occurrences of the pattern in `Y`

and returns the indices of the matches (together with a little more information so they can be extracted, if one wanted). Because `gregexpr`

will return `-1`

for no match, testing for numbers greater than 0 will let us simply sum the `TRUE`

values to get the number of macthes; in this case, 59.

The other sample cases mentioned:

```
count_pattern(c(1,0,0,1), c(1,0,0,1,1,0,0,1))
## [1] 2
count_pattern(c(1,0,0,1), c(1,0,0,0,1))
## [1] 0
```