piRSquared piRSquared - 1 year ago 80
Python Question

retrieve series of slices of column headers based on truth of dataframe values

consider the dataframe


df = pd.DataFrame(np.random.choice((0, 1), (3, 3)),
columns=['blah', 'meep', 'zimp'])

enter image description here


what is the most efficient way to slice
with each row of

(for this example and at scale)

expected results

0 [meep]
1 [blah]
2 [blah, zimp]
dtype: object

At Scale

I confirmed that @jezrael, @boud, and my answer all produce the same results. Below is the dataframe I used to test the scale of each solution

from string import letters
import pandas as pd
import numpy as np

mux = pd.MultiIndex.from_product([list(letters), list(letters)])

df = pd.DataFrame(np.arange(52 ** 4).reshape(52 ** 2, -1) % 3 % 2, mux, mux)

setup for boud

s = pd.Series([[x] for x in df], df.columns)

setup for pirsquared

num = df.columns.nlevels
lvls = list(range(num))
rlvls = [x * -1 - 1 for x in lvls]
xsl = lambda x: x.xs(x.name).index.tolist()


enter image description here

small df

enter image description here

Answer Source

You can use mul with list comprehension:

df = df.mul(df.columns.to_series(), axis=1)
print (df)
   blah  meep  zimp
0        meep      
1  blah            
2  blah        zimp

print ([list(filter(None, x)) for x in df.values.tolist()])
[['meep'], ['blah'], ['blah', 'zimp']]

print (pd.Series([list(filter(None, x)) for x in df.values.tolist()], index=df.index))
0          [meep]
1          [blah]
2    [blah, zimp]
dtype: object
Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download