hans glick hans glick - 5 months ago 196
Python Question

Pandas Python, select columns based on rows conditions

I have a dataframe:

import pandas as pd
df = pd.DataFrame(np.random.randn(2, 4))
print(df)
0 1 2 3
0 1.489198 1.329603 1.590124 1.123505
1 0.024017 0.581033 2.500397 0.156280


I want to select the columns which for there is at least one row with a value greater than
2
. I tried the following, but it did not work as expected.

df[df.columns[df.iloc[(0,1)]>2]]


In this toy example my expected output would be:

2
1.590124
2.500397

Answer

Use gt and any to filter the df:

In [287]:
df.ix[:,df.gt(2).any()]

Out[287]:
          2
0  1.590124
1  2.500397

Here we use ix to select all rows, the first : and the next arg is a boolean mask of the columns that meet the condition:

In [288]:
df.gt(2)

Out[288]:
       0      1      2      3
0  False  False  False  False
1  False  False   True  False

In [289]:
df.gt(2).any()

Out[289]:
0    False
1    False
2     True
3    False
dtype: bool

In your example what you did was select the cell value for the first row and second column, you then tried to use this to mask the columns but this just returned the first column hence why it didn't work:

In [291]:
df.iloc[(0,1)]

Out[291]:
1.3296030000000001

In [293]:
df.columns[df.iloc[(0,1)]>2]

Out[293]:
'0'
Comments