Edward - 10 months ago 53

Python Question

I want to apply kruskal test for several columns. I do as bellow

`import pandas as pd`

import scipy

df = pd.DataFrame({'a':range(9), 'b':[1,2,3,1,2,3,1,2,3], 'group':['a', 'b', 'c']*3})

and then the Loop

`groups = {}`

res = []

for grp in df['group'].unique():

for column in df[[0, 1]]:

groups[grp] = df[column][df['group']==grp].values

args = groups.values()

g = scipy.stats.kruskal(*args)

res.append(g)

print (res)

I get

`[KruskalResult(statistic=8.0000000000000036, pvalue=0.018315638888734137)]`

But i want

`[KruskalResult(statistic=0.80000000000000071, pvalue=0.67032004603563911)]`

[KruskalResult(statistic=8.0000000000000036, pvalue=0.018315638888734137)]

Where is my mistake?

for a single column i do as below

`import pandas as pd`

import scipy

df = pd.DataFrame({'numbers':range(9), 'group':['a', 'b', 'c']*3})

groups = {}

for grp in df['group'].unique():

groups[grp] = df['numbers'][df['group']==grp].values

print(groups)

args = groups.values()

scipy.stats.kruskal(*args)

Answer Source

Your for loops are upside down: the one-column algorithm is your loop invariant with regards to the column you chose. So the column for loop must be the outer loop. In plain English "for each column apply the kruskal algorithm which consists of this group.unique for loop:

```
groups = {}
res = []
for column in df[[0, 1]]:
for grp in df['group'].unique():
groups[grp] = df[column][df['group']==grp].values
args = groups.values()
g = scipy.stats.kruskal(*args)
res.append(g)
print (res)
```