smiss - 1 year ago 123

Python Question

I try to get a new series from a DataFrame. This series should contain the column names of the DataFrame's values that are above some value for each row of the DataFrame. But beginning from the left of the DataFrame, like this:

`df = pd.DataFrame(np.random.randint(0,10,size=(5, 6)), columns=list('ABCDEF'))`

>>> df

A B C D E F

0 2 4 6 8 8 4

1 2 0 9 7 7 1

2 1 7 7 7 3 0

3 5 4 4 0 1 7

4 9 6 1 5 1 5

min = 3

Expected Output:

`0 B`

1 C

2 B

3 A

4 A

dtype: object

Here the output's row 0 is "B" because in the DataFrame row index 0 column "B" is the most left column that has a value that is equal or bigger than

`min = 3`

I know that I an use

`df.idxmin(axis = 1)`

Thanks for help or hints!

Answer

**UPDATE - index of the first element in each row, satisfying condition:**

more elegant and more efficient version from @DSM:

```
In [156]: (df>=3).idxmax(1)
Out[156]:
0 B
1 C
2 B
3 A
4 A
dtype: object
```

my version:

```
In [149]: df[df>=3].apply(lambda x: x.first_valid_index(), axis=1)
Out[149]:
0 B
1 C
2 B
3 A
4 A
dtype: object
```

**Old answer - index of the minimum element for each row:**

```
In [27]: df[df>=3].idxmin(1)
Out[27]:
0 E
1 A
2 C
3 C
4 F
dtype: object
```

Source (Stackoverflow)