pocketlizard - 1 month ago 8

R Question

I wasn't sure if this should go in SO or some other .SE, so I will delete if this is deemed to be off-topic.

I have a vector and I'm trying to calculate the variance "by hand" (meaning based on the definition of variance but still performing the calculations in R) using the equation:

`V[X] = E[X^2] - E[X]^2`

`E[X] = sum (x * f(x))`

`E[X^2] = sum (x^2 * f(x))`

However, my calculated variance is different from the

`var()`

`var()`

`vec <- c(3, 5, 4, 3, 6, 7, 3, 6, 4, 6, 3, 4, 1, 3, 4, 4)`

range(vec)

counts <- hist(vec + .01, breaks = 7)$counts

fx <- counts / (sum(counts)) #the pmf f(x)

x <- c(min(vec): max(vec)) #the values of x

exp <- sum(x * fx) ; exp #expected value of x

exp.square <- sum(x^2 * fx) #expected value of x^2

var <- exp.square - (exp)^2 ; var #calculated variance

var(vec)

This gives me a calculated variance of 2.234 but the

`var()`

Answer

While V[X] = E[X^2] - E[X]^2 is the *population variance* (when the values in the vector are the whole population, not just a sample), the `var`

function calculates an *estimator* for the population variance (the *sample variance*).