minks minks - 10 months ago 68
Python Question

How can I calculate the variance of a list in python?

If I have a list like this:

results=[-14.82381293 -0.29423447 -13.56067979 -1.6288903 -0.31632439
0.53459687 -1.34069996 -1.61042692 -4.03220519 -0.24332097]

I want to calculate the variance of this list in Python.

Variance = The average of the squared differences from the mean.

How can I go about this? Accessing the elements in the list to do the computations is confusing me for getting the square differences.

Answer Source

Just use numpy's built-in function var (and add commas to your list):

import numpy as np

results = [-14.82381293, -0.29423447, -13.56067979, -1.6288903, -0.31632439,
          0.53459687, -1.34069996, -1.61042692, -4.03220519, -0.24332097]

print np.var(results)

This gives you 28.822364260579157

If - for whatever reason - you cannot use numpy and/or you don't want to use a built-in function for it, you can also calculate it by hand using e.g. a list comprehension:

# calculate mean
m = sum(results) / len(results)

# calculate variance using a list comprehension
varRes = sum([(xi - m)**2 for xi in results]) / len(results)

which gives you the identical result.


@Serge Ballesta explained very well the difference between variance n and n-1. In numpy you can easily set this parameter using the option ddof; its default is 0, so for the n-1 case you can simply do:

np.var(results, ddof=1)

The "by hand" solution would be:

sum([(xi - m)**2 for xi in results]) / (len(results) - 1)

Both approaches give you 32.024849178421285.