user2543645 - 4 months ago 14

Python Question

Suppose I have a pandas dataframe like this:

`cat val`

0 a 1

1 a 6

2 a 12

3 b 2

4 b 5

5 b 11

6 c 4

7 c 22

And I want to know, for each category (each value of 'cat') what is the position where the value is closest to a given value, say 5.5. I can subtract off my target value and take the absolute value, giving me something like this:

`cat val val_delt`

0 a 1 4.5

1 a 6 0.5

2 a 12 6.5

3 b 2 3.5

4 b 5 0.5

5 b 11 5.5

6 c 4 1.5

7 c 22 16.5

But I'm stuck about where to go next. My first thought was to use argmin() with groupby(), but this gives an error:

`In [375]: df.groupby('cat').val_delt.argmin()`

---------------------------------------------------------------------------

TypeError Traceback (most recent call last)

<ipython-input-375-a2c3dbc43c50> in <module>()

----> 1 df.groupby('cat').val_delt.argmin()

TypeError: 'Series' object is not callable

I could, of course, come up with some horrible hacky thing in standard python where I iterate over all values of cat, then select the subset of my data corresponding to that value, perform the argmin operation then figure out where in the original dataframe that row was. But there's got to be a more elegant way to do this.

What I want as an output is either something like this:

`cat val`

1 a 6

4 b 5

6 c 4

or at least some structure that contains that relevant information (eg - {'a':1, 'b':4, 'c':6} ). I don't care if I get back the index value or the index position, but I need one of the two. I don't care about getting back the value - I can always get that later once I have the index subset.

Answer

`argmin()`

is not an agg function, you can use apply to get the closest index of every group:

```
txt = """ cat val
0 a 1
1 a 6
2 a 12
3 b 2
4 b 5
5 b 11
6 c 4
7 c 22"""
import io
df = pd.read_csv(io.BytesIO(txt), delim_whitespace=True, index_col=0)
df["val_delt"] = (df.val - 5.5).abs()
idx = df.groupby("cat").apply(lambda df:df.val_delt.argmin())
df.ix[idx, :]
```

output:

```
cat val val_delt
1 a 6 0.5
4 b 5 0.5
6 c 4 1.5
```