pocketlizard pocketlizard - 1 year ago 92
R Question

Why is the var() function giving me a different answer than my calculated variance?

I wasn't sure if this should go in SO or some other .SE, so I will delete if this is deemed to be off-topic.

I have a vector and I'm trying to calculate the variance "by hand" (meaning based on the definition of variance but still performing the calculations in R) using the equation:

V[X] = E[X^2] - E[X]^2
E[X] = sum (x * f(x))
E[X^2] = sum (x^2 * f(x))

However, my calculated variance is different from the
function that R has (which I was using to check my work). Why is the
function different? How is it calculating variance? I've checked my calculations several times so I'm fairly confident in the value I calculated. My code is provided below.

vec <- c(3, 5, 4, 3, 6, 7, 3, 6, 4, 6, 3, 4, 1, 3, 4, 4)
counts <- hist(vec + .01, breaks = 7)$counts
fx <- counts / (sum(counts)) #the pmf f(x)
x <- c(min(vec): max(vec)) #the values of x
exp <- sum(x * fx) ; exp #expected value of x
exp.square <- sum(x^2 * fx) #expected value of x^2
var <- exp.square - (exp)^2 ; var #calculated variance

This gives me a calculated variance of 2.234 but the
function says the variance is 2.383.

Answer Source

While V[X] = E[X^2] - E[X]^2 is the population variance (when the values in the vector are the whole population, not just a sample), the var function calculates an estimator for the population variance (the sample variance).

Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download