Kelaref - 1 year ago 246

Python Question

I have a dataframe zdf that looks like the following:

`Index A B C Mean`

2008-11-21 23 12 16 18

2008-11-24 26 14 15 17

2008-11-25 28 20 21 25

2008-11-26 25 26 27 26

I am trying to apply a two-sided t-test on each row,and storing the result in a new column. Using

`stats.ttest_1samp`

`from scipy.stats import stats`

It takes a first parameter a list (all values on each row except last), and a second parameter, the mean (last column in zdf). It will return two values: The t-statistic and the p-value. I am trying the following:

`for i in range(zdf.shape[0]+1):`

zdf.ix[i,'ttest'] = stats.ttest_1samp(list(zdf.iloc[i,:-1]),zdf.iloc[i,-1])

I keep getting a value error for some reason, but surely there's a better way to apply this without for looping?

Thank you in advance.

Recommended for you: Get network issues from **WhatsUp Gold**. **Not end users.**

Answer Source

you can't set an array element with a sequence with `.ix[]`

so you need to pass a single array such has:

```
for i in range(zdf.shape[0]+1):
zdf.ix[i,'ttest_res1'] = stats.ttest_1samp(zdf.iloc[i,:-1].values,zdf.iloc[i,-1])[1]
zdf.ix[i,'ttest_res2'] = stats.ttest_1samp(zdf.iloc[i,:-1].values,zdf.iloc[i,-1])[2]
```

also, I would pass an array instead of a list in the first argument with `.values`

Recommended from our users: **Dynamic Network Monitoring from WhatsUp Gold from IPSwitch**. ** Free Download**