smiss smiss - 7 months ago 51
Python Question

Pandas Series with column names for each value above a minimum

I try to get a new series from a DataFrame. This series should contain the column names of the DataFrame's values that are above some value for each row of the DataFrame. But beginning from the left of the DataFrame, like this:

df = pd.DataFrame(np.random.randint(0,10,size=(5, 6)), columns=list('ABCDEF'))

>>> df

A B C D E F
0 2 4 6 8 8 4
1 2 0 9 7 7 1
2 1 7 7 7 3 0
3 5 4 4 0 1 7
4 9 6 1 5 1 5


min = 3


Expected Output:

0 B
1 C
2 B
3 A
4 A
dtype: object


Here the output's row 0 is "B" because in the DataFrame row index 0 column "B" is the most left column that has a value that is equal or bigger than
min = 3
.

I know that I an use
df.idxmin(axis = 1)
to get the column names of the minimum for each row but I have now clue at all how to tackle this more complex problem.

Thanks for help or hints!

Answer

UPDATE - index of the first element in each row, satisfying condition:

more elegant and more efficient version from @DSM:

In [156]: (df>=3).idxmax(1)
Out[156]:
0    B
1    C
2    B
3    A
4    A
dtype: object

my version:

In [149]: df[df>=3].apply(lambda x: x.first_valid_index(), axis=1)
Out[149]:
0    B
1    C
2    B
3    A
4    A
dtype: object

Old answer - index of the minimum element for each row:

In [27]: df[df>=3].idxmin(1)
Out[27]:
0    E
1    A
2    C
3    C
4    F
dtype: object