Lodore66 - 1 year ago 164
Python Question

# Getting nan for p values in scipy chisquare: Don't know why?

Would be very grateful for help with this issue. It seems like it should be straightforward.

I have two columns in a pandas dataframe called Totals: Totals['Connections'] and Totals['Expected']. Totals['Connections'] contains the observed number of incidences of my relevant variable; Totals['Expected'] contains the expected number of observations. I want to compare the two using the scipy chisquare function. I do this in the following way:

``````sp.stats.chisquare([Totals.Connections], f_exp=[Totals.Expected])
``````

However, when I do, I get a valid test statistic, but an 'nan' for my p value, as below (see below). Also, what does the 'Power_divergence' text at the start of the result mean? Can anyone explain what I'm doing wrong here?

``````Power_divergenceResult(statistic=array([  1.05408049e+03,   6.30832196e+02,   7.02987722e+01,
9.17326057e+00,   1.56193724e+01,   3.36275580e+01,
6.16076398e+02,   1.50373806e+02,   2.94802183e+01,
2.65321965e+02,   1.00900409e+01,   3.06515689e+02,
1.38828104e+02,   3.68894952e+02,   1.92873124e+02,
5.67564802e+02,   2.36092769e+02,   1.77298772e+03,
3.55388267e+03,   6.42013643e+02,   1.55858117e+02,
1.22783083e+02,   1.36425648e-03,   2.47579809e+02,
2.36092769e+02,   7.02987722e+01,   1.23124147e+01,
6.10587995e+02,   2.75088677e+01,   2.76261937e+02,
2.00121419e+02,   4.97702592e+02,   2.01167804e+02,
1.26909959e+02,   2.60530696e+02,   6.66316508e+01,
2.15019100e+02]), pvalue=array([ nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,
nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,
nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,
nan,  nan,  nan,  nan]))
``````

It looks like you want your arguments to each be one-dimensional, but you have extra brackets around the arguments, which adds an extra dimension to each argument. Remove those extra brackets:

``````sp.stats.chisquare(Totals.Connections, f_exp=Totals.Expected)
``````

For example, here's a typical use of `chisquare`:

``````In [49]: chisquare([4, 4, 5, 5], [4, 3, 7, 4])
Out[49]: Power_divergenceResult(statistic=1.1547619047619047, pvalue=0.76387343970439647)
``````

If you wrap the arguments in an extra level of brackets, they become two-dimensional, and chisquare is applied to each (trivial) column (because the default is `axis=0`):

``````In [50]: chisquare([[4, 4, 5, 5]], [[4, 3, 7, 4]])
Out[50]: Power_divergenceResult(statistic=array([ 0.        ,  0.33333333,  0.57142857,  0.25      ]), pvalue=array([ nan,  nan,  nan,  nan]))
``````

That calculation is the same as calling `chisquare` four times, once for each column of the arguments. And when the length of the arguments is just 1, the p-value is `nan`:

``````In [59]: chisquare([4], [4])
Out[59]: Power_divergenceResult(statistic=0.0, pvalue=nan)

In [60]: chisquare([4], [3])
Out[60]: Power_divergenceResult(statistic=0.33333333333333331, pvalue=nan)

In [61]: chisquare([5], [7])
Out[61]: Power_divergenceResult(statistic=0.5714285714285714, pvalue=nan)

In [62]: chisquare([5], [4])
Out[62]: Power_divergenceResult(statistic=0.25, pvalue=nan)
``````

To get the expected result while leaving in the extra brackets, you have to use `axis=1`:

``````In [63]: chisquare([[4, 4, 5, 5]], [[4, 3, 7, 4]], axis=1)
Out[63]: Power_divergenceResult(statistic=array([ 1.1547619]), pvalue=array([ 0.76387344]))
``````
Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download