Toni Toni - 2 months ago 14
R Question

Understanding R convolution code with sapply()

I am trying to break apart the R code in this post:

x <- c(0.17,0.46,0.62,0.08,0.40,0.76,0.03,0.47,0.53,0.32,0.21,0.85,0.31,0.38,0.69)


convolve.binomial <- function(p) {
# p is a vector of probabilities of Bernoulli distributions.
# The convolution of these distributions is returned as a vector
# `z` where z[i] is the probability of i-1, i=1, 2, ..., length(p)+1.
n <- length(p) + 1
z <- c(1, rep(0, n-1))
sapply(p, function(q) {z <<- (1 - q) * z + q * (c(0, z[-n])); q})
z
}
convolve.binomial(x)
[1] 5.826141e-05 1.068804e-03 8.233357e-03 3.565983e-02 9.775029e-02
[6] 1.804516e-01 2.323855e-01 2.127628e-01 1.394564e-01 6.519699e-02
[11] 2.141555e-02 4.799630e-03 6.979119e-04 6.038947e-05 2.647052e-06
[16] 4.091095e-08


I tried
debugging
in RStudio, but it still opaque.

The issue is with the line:
sapply(p, function(q) {z <<- (1 - q) * z + q * (c(0, z[-n])); q})
.

I guess that within the context of the call
convolve.binomial(x)
p = q = x
. At least I get identical results if I pull the lines outside the function and run
sapply(x, function(x) {z <<- (1 - x) * z + x * (c(0, z[-n])); x})
:

x <- c(0.17,0.46,0.62,0.08,0.40,0.76,0.03,0.47,0.53,0.32,0.21,0.85,0.31,0.38,0.69)
n <- length(x) + 1
z <- c(1, rep(0, n-1))
# [1] 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
sapply(x, function(x) {z <<- (1 - x) * z + x * (c(0, z[-n])); x})
z # Is extracted by calling it and contains the correct result





My questions are:


  1. What is the purpose of the
    ;q}
    ending within
    sapply()
    ?

  2. How does it relate to the
    <<-
    symbol, meant to make
    z
    accessible outside of the "implicit" loop that is
    sapply()
    ?






Below you can see my problem "hacking" this line of code:

(x_complem = 1 - x)
sapply(x, function(x) {z <<- x_complem * z + x * (c(0, z[-n])); x})
z # Returns 16 values and warnings

z_offset = c(0, z[-n])
# [1] 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0
sapply(x, function(x) {z <<- (1 - x) * z + x * z_offset; x})
z # Returns different values.

42- 42-
Answer

If you want to see the intermediate values of z as the function proceeds then insert either a cat or a print command in the code below:

sapply(x, function(x) {z <<- (1 - x) * z + x * (c(0, z[-n])); cat(z,"\n"); x})
#--------
0.83 0.17 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
0.4482 0.4736 0.0782 0 0 0 0 0 0 0 0 0 0 0 0 0 
0.170316 0.457852 0.323348 0.048484 0 0 0 0 0 0 0 0 0 0 0 0 
0.1566907 0.4348491 0.3341083 0.07047312 0.00387872 0 0 0 0 0 0 0 0 0 0 0 
0.09401443 0.3235858 0.3744046 0.1759272 0.03051648 0.001551488 0 0 0 0 0 0 0 0 0 0 
0.02256346 0.1491116 0.3357823 0.3267701 0.1410286 0.02356488 0.001179131 0 0 0 0 0 0 0 0 0 
snipped rest of output

I think this makes it clearer that what is happening is that each intermediate step represents a set of probabilities for a series of events. Each row sums to 1.0 and represents the probabilities of individual count survivals when there might be a smaller number of binomial parameters. The final result displays the probabilities of particular sums of counts after the full sequence has been assembled.