user2543645 user2543645 - 1 month ago 4
Python Question

How to use argmin with groupby in pandas

Suppose I have a pandas dataframe like this:

cat val
0 a 1
1 a 6
2 a 12
3 b 2
4 b 5
5 b 11
6 c 4
7 c 22


And I want to know, for each category (each value of 'cat') what is the position where the value is closest to a given value, say 5.5. I can subtract off my target value and take the absolute value, giving me something like this:

cat val val_delt
0 a 1 4.5
1 a 6 0.5
2 a 12 6.5
3 b 2 3.5
4 b 5 0.5
5 b 11 5.5
6 c 4 1.5
7 c 22 16.5


But I'm stuck about where to go next. My first thought was to use argmin() with groupby(), but this gives an error:

In [375]: df.groupby('cat').val_delt.argmin()
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-375-a2c3dbc43c50> in <module>()
----> 1 df.groupby('cat').val_delt.argmin()

TypeError: 'Series' object is not callable


I could, of course, come up with some horrible hacky thing in standard python where I iterate over all values of cat, then select the subset of my data corresponding to that value, perform the argmin operation then figure out where in the original dataframe that row was. But there's got to be a more elegant way to do this.

What I want as an output is either something like this:

cat val
1 a 6
4 b 5
6 c 4


or at least some structure that contains that relevant information (eg - {'a':1, 'b':4, 'c':6} ). I don't care if I get back the index value or the index position, but I need one of the two. I don't care about getting back the value - I can always get that later once I have the index subset.

Answer

argmin() is not an agg function, you can use apply to get the closest index of every group:

txt = """  cat  val
0   a    1
1   a    6
2   a   12
3   b    2
4   b    5
5   b   11
6   c    4
7   c   22"""

import io

df = pd.read_csv(io.BytesIO(txt), delim_whitespace=True, index_col=0)
df["val_delt"] = (df.val - 5.5).abs()
idx = df.groupby("cat").apply(lambda df:df.val_delt.argmin())
df.ix[idx, :]

output:

cat  val  val_delt
1   a    6       0.5
4   b    5       0.5
6   c    4       1.5
Comments