pocketlizard pocketlizard - 1 month ago 8
R Question

Why is the var() function giving me a different answer than my calculated variance?

I wasn't sure if this should go in SO or some other .SE, so I will delete if this is deemed to be off-topic.

I have a vector and I'm trying to calculate the variance "by hand" (meaning based on the definition of variance but still performing the calculations in R) using the equation:

V[X] = E[X^2] - E[X]^2
where
E[X] = sum (x * f(x))
and
E[X^2] = sum (x^2 * f(x))


However, my calculated variance is different from the
var()
function that R has (which I was using to check my work). Why is the
var()
function different? How is it calculating variance? I've checked my calculations several times so I'm fairly confident in the value I calculated. My code is provided below.

vec <- c(3, 5, 4, 3, 6, 7, 3, 6, 4, 6, 3, 4, 1, 3, 4, 4)
range(vec)
counts <- hist(vec + .01, breaks = 7)$counts
fx <- counts / (sum(counts)) #the pmf f(x)
x <- c(min(vec): max(vec)) #the values of x
exp <- sum(x * fx) ; exp #expected value of x
exp.square <- sum(x^2 * fx) #expected value of x^2
var <- exp.square - (exp)^2 ; var #calculated variance
var(vec)


This gives me a calculated variance of 2.234 but the
var()
function says the variance is 2.383.

Answer

While V[X] = E[X^2] - E[X]^2 is the population variance (when the values in the vector are the whole population, not just a sample), the var function calculates an estimator for the population variance (the sample variance).