Blain Waan - 1 year ago 82

R Question

We have a big

`for loop`

`R`

For example, while running the following code, we get

`rm(list=ls())`

n <- 10

z <- NULL

for(i in 1:n){

set.seed(i)

a <- rbinom(1,1,0.5)

b <- rbinom(1,1,0.5)

z[i] <- a+b

}

z

[1] 0 1 1 1 1 2 1 0 0 1

We desire to skip these steps so that we do not have any

Answer Source

Normally we do this via a `while`

loop, as the number of iterations required is unknown beforehand.

```
n <- 10L
z <- integer(n)
m <- 1L; i <- 0L
while (m <= n) {
set.seed(i)
z_i <- sum(rbinom(2L, 1, 0.5))
if (z_i > 0L) {z[m] <- z_i; m <- m + 1L}
i <- i + 1L
}
```

Output:

```
z
# [1] 1 1 1 1 1 2 1 1 1 1
i
# [1] 14
```

So we sample 14 times, 4 of which are 0 and the rest 10 are retained.

**More efficient vectorized method**

```
set.seed(0)
n <- 10L
z <- rbinom(n, 1, 0.5) + rbinom(n, 1, 0.5)
m <- length(z <- z[z > 0L]) ## filtered samples
p <- m / n ## estimated success probability
k <- round(1.5 * (n - m) / p) ## further number of samples to ensure successful (n - m) non-zero samples
z_more <- rbinom(k, 1, 0.5) + rbinom(k, 1, 0.5)
z <- c(z, z_more[which(z_more > 0)[seq_len(n - m)]])
```

Some probability theory of geometric distribution has been used here. Initially we sample `n`

samples, `m`

of which are retained. So the estimated probability of success in accepting samples is `p <- m/n`

. According to theory of Geometric distribution, on average, we need at least `1/p`

samples to observe a success. Therefore, we should at least sample `(n-m)/p`

more times to expect `(n-m)`

success. The `1.5`

is just an inflation factor. By sampling 1.5 times more samples we hopefully can ensure `(n-m)`

success.

According to Law of large numbers, the estimate of `p`

is more precise when `n`

is large. Therefore, this approach is stable for large `n`

.

If you feel that 1.5 is not large enough, use 2 or 3. But my feeling is that it is sufficient.