pocketlizard - 1 year ago 92
R Question

# Why is the var() function giving me a different answer than my calculated variance?

I wasn't sure if this should go in SO or some other .SE, so I will delete if this is deemed to be off-topic.

I have a vector and I'm trying to calculate the variance "by hand" (meaning based on the definition of variance but still performing the calculations in R) using the equation:

`V[X] = E[X^2] - E[X]^2`
where
`E[X] = sum (x * f(x))`
and
`E[X^2] = sum (x^2 * f(x))`

However, my calculated variance is different from the
`var()`
function that R has (which I was using to check my work). Why is the
`var()`
function different? How is it calculating variance? I've checked my calculations several times so I'm fairly confident in the value I calculated. My code is provided below.

``````vec <- c(3, 5, 4, 3, 6, 7, 3, 6, 4, 6, 3, 4, 1, 3, 4, 4)
range(vec)
counts <- hist(vec + .01, breaks = 7)\$counts
fx <- counts / (sum(counts)) #the pmf f(x)
x <- c(min(vec): max(vec)) #the values of x
exp <- sum(x * fx) ; exp #expected value of x
exp.square <- sum(x^2 * fx) #expected value of x^2
var <- exp.square - (exp)^2 ; var #calculated variance
var(vec)
``````

This gives me a calculated variance of 2.234 but the
`var()`
function says the variance is 2.383.

While V[X] = E[X^2] - E[X]^2 is the population variance (when the values in the vector are the whole population, not just a sample), the `var` function calculates an estimator for the population variance (the sample variance).