Edward Edward - 1 month ago 16
Python Question

Loop for pandas columns

I want to apply kruskal test for several columns. I do as bellow

import pandas as pd
import scipy
df = pd.DataFrame({'a':range(9), 'b':[1,2,3,1,2,3,1,2,3], 'group':['a', 'b', 'c']*3})


and then the Loop

groups = {}
res = []
for grp in df['group'].unique():
for column in df[[0, 1]]:
groups[grp] = df[column][df['group']==grp].values
args = groups.values()
g = scipy.stats.kruskal(*args)
res.append(g)
print (res)


I get

[KruskalResult(statistic=8.0000000000000036, pvalue=0.018315638888734137)]


But i want

[KruskalResult(statistic=0.80000000000000071, pvalue=0.67032004603563911)]
[KruskalResult(statistic=8.0000000000000036, pvalue=0.018315638888734137)]


Where is my mistake?

for a single column i do as below

import pandas as pd
import scipy
df = pd.DataFrame({'numbers':range(9), 'group':['a', 'b', 'c']*3})
groups = {}
for grp in df['group'].unique():
groups[grp] = df['numbers'][df['group']==grp].values
print(groups)
args = groups.values()
scipy.stats.kruskal(*args)

Answer

Your for loops are upside down: the one-column algorithm is your loop invariant with regards to the column you chose. So the column for loop must be the outer loop. In plain English "for each column apply the kruskal algorithm which consists of this group.unique for loop:

groups = {}
res = []
for column in df[[0, 1]]:
    for grp in df['group'].unique():
        groups[grp] = df[column][df['group']==grp].values
    args = groups.values()
    g = scipy.stats.kruskal(*args)
    res.append(g)
print (res)