ranadan ranadan - 5 months ago 11
Python Question

Sorting each column and gettin the name of column on the base of value of columns python

I am trying to a data frame which is something like this

user_name tag1 tag2 tag3 tag4
user1 .65 .32 .91 0
user2 .34 .44 .21 .56
user3 .21 0 0 .19


I need to sort each row on the values of column and get the result columns with highest value and decreasing for each row, also need to remove columns having 0 value for particular user. . output should look like something this.

user_name 0 1 2 3
user1 tag3 tag1 tag2
user2 tag4 tag2 tag1 tag3
user3 tag1 tag4


or transpose of this will also work. I need to do this in python2.7. Thank you.

Answer

If you replace the 0 values with NaN then you can apply a lambda to mask the index:

In [28]:
df.replace(0,np.NaN, inplace=True)
def func(x):
    val = x.sort_values(ascending=False).index.to_series()
    mask = pd.isnull(x)
    val[mask] = ''
    return val.values
df.ix[:, 'tag1':] = df.ix[:, 'tag1':].apply(lambda x: func(x), axis=1)
df

Out[28]:
  user_name  tag1  tag2  tag3  tag4
0     user1  tag3  tag1  tag2      
1     user2  tag4  tag2  tag1  tag3
2     user3  tag1  tag4            

I use NaN here rather than comparing against 0 because comparing against float scalar values are problematic and may not work:

In [32]:
def func(x):
    val = x.sort_values(ascending=False).index.to_series()
    mask = val == 0
    val[mask] = ''
    return val.values
df.ix[:, 'tag1':] = df.ix[:, 'tag1':].apply(lambda x: func(x), axis=1)
df

Out[32]:
  user_name  tag1  tag2  tag3  tag4
0     user1  tag3  tag1  tag2  tag4
1     user2  tag4  tag2  tag1  tag3
2     user3  tag1  tag4  tag3  tag2
Comments