piRSquared - 1 year ago 307

Python Question

I'm calculating a coskew matrix and wanted to double check my calculation with pandas built in

`skew`

define my series as:

`import pandas as pd`

series = pd.Series(

{0: -0.051917457635120283,

1: -0.070071606515280632,

2: -0.11204865874074735,

3: -0.14679988245503134,

4: -0.088062467095565145,

5: 0.17579741198527793,

6: -0.10765856028420773,

7: -0.11971470229167547,

8: -0.15169210769159247,

9: -0.038616800990881606,

10: 0.16988162977411481,

11: 0.092999418364443032}

)

I compared the following calculations and expected them to be the same.

`series.skew()`

1.1119637586658944

`(((series - series.mean()) / series.std(ddof=0)) ** 3).mean()`

0.967840223081231

This is significantly different. I thought it might be Fisher-Pearson coefficient. So I did:

`n = len(series)`

skew = series.sub(series.mean()).div(series.std(ddof=0)).apply(lambda x: x ** 3).mean()

skew * (n * (n - 1)) ** 0.5 / (n - 1)

1.0108761442417222

Still off by quite a bit.

How does pandas calculate skew?

Answer Source

I found `scipy.stats.skew`

with parameter `bias=False`

return equal output, so I think in `pandas skew`

is `bias=False`

by default:

bias : bool

If False, then the calculations are corrected for statistical bias.

```
import pandas as pd
import scipy.stats.stats as stats
series = pd.Series(
{0: -0.051917457635120283,
1: -0.070071606515280632,
2: -0.11204865874074735,
3: -0.14679988245503134,
4: -0.088062467095565145,
5: 0.17579741198527793,
6: -0.10765856028420773,
7: -0.11971470229167547,
8: -0.15169210769159247,
9: -0.038616800990881606,
10: 0.16988162977411481,
11: 0.092999418364443032}
)
print (series.skew())
1.11196375867
print (stats.skew(series, bias=False))
1.1119637586658944
```

Not sure for 100%, but I think I find it in code

EDIT (piRSquared)

From `scipy`

`skew`

code

```
if not bias:
can_correct = (n > 2) & (m2 > 0)
if can_correct.any():
m2 = np.extract(can_correct, m2)
m3 = np.extract(can_correct, m3)
nval = ma.sqrt((n-1.0)*n)/(n-2.0)*m3/m2**1.5
np.place(vals, can_correct, nval)
return vals
```

The adjustment was `(n * (n - 1)) ** 0.5 / (n - 2)`

and not `(n * (n - 1)) ** 0.5 / (n - 1)`