Ale Ale - 2 months ago 22
Python Question

Generate Series with column names of DataFrame that match condition

I have a data frame with many columns containing true/false values. E. g.

import pandas as pd
data = pd.DataFrame([[True, True, False],
[False, False, True],
[True, False, True],
[False, False, False],
[True, True, False]],
columns=['A','B','C'])


Actually there are many more than just those three columns.

I need to generate an additional column where each value is a list of the names of all columns where the value is true. For the example this should be:

0 [A, B]
1 [C]
2 [A, C]
3 []
4 [A, B]
Name: X, dtype: object


Is there any magic trick in Pandas to achieve this without using nested loops (which is the only idea I had so far)?

Answer

You can use apply method to loop through rows and use each row to subset the column names:

data.apply(lambda r: data.columns[r].tolist(), axis = 1)

#0    [A, B]
#1       [C]
#2    [A, C]
#3        []
#4    [A, B]
#dtype: object