Kelaref Kelaref - 1 year ago 246
Python Question

Pandas: apply stats.ttest_1samp on every row

I have a dataframe zdf that looks like the following:

Index A B C Mean
2008-11-21 23 12 16 18
2008-11-24 26 14 15 17
2008-11-25 28 20 21 25
2008-11-26 25 26 27 26

I am trying to apply a two-sided t-test on each row,and storing the result in a new column. Using

from scipy.stats import stats

It takes a first parameter a list (all values on each row except last), and a second parameter, the mean (last column in zdf). It will return two values: The t-statistic and the p-value. I am trying the following:

for i in range(zdf.shape[0]+1):
zdf.ix[i,'ttest'] = stats.ttest_1samp(list(zdf.iloc[i,:-1]),zdf.iloc[i,-1])

I keep getting a value error for some reason, but surely there's a better way to apply this without for looping?

Thank you in advance.

Answer Source

you can't set an array element with a sequence with .ix[] so you need to pass a single array such has:

for i in range(zdf.shape[0]+1):
    zdf.ix[i,'ttest_res1'] = stats.ttest_1samp(zdf.iloc[i,:-1].values,zdf.iloc[i,-1])[1]
    zdf.ix[i,'ttest_res2'] = stats.ttest_1samp(zdf.iloc[i,:-1].values,zdf.iloc[i,-1])[2]

also, I would pass an array instead of a list in the first argument with .values

Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download