luisfer - 4 months ago 31

Python Question

This is a fragment of the dataframe I have:

`Title | Age`

------+--------

Mr. | 30

Mr. | NaN

Mr. | 32

Mrs. | 28

Mrs. | 16

Mr. | 34

Mrs. | NaN

Edit: I added the last row, to clarify the question

I want to impute the NaNs (second and last row), for the second row, it should use the mean of the other "Mr." in the dataframe, so in this case, should be 32, in the last row it should use the mean of the other "Mrs.", so should be 22

To calculate the mean is as easy as doing

`value = df.loc[df["Title"] == "Mr."]["Age"].mean()`

So I wrote a function called agefun:

`def agefun(df, t):`

return df.loc[df["Title"] == t]["Age"].mean()

And it works, now, how can I use this function with the fillna() function? I'd like something like:

`df['Age'].fillna(agefun(df, this_row_title))`

But of course it doesn't work, I don't know how to tell the function I like the value corresponding to the Title in that specific row.

How can this be performed?

Answer

Transform keeps the same shape as the original series in the dataframe.

```
df['Age'] = df.groupby('Title').transform(lambda group: group.fillna(group.mean()))
>>> df
Title Age
0 Mr. 30
1 Mr. 32 # (30 + 32 + 34) / 3 = 32
2 Mr. 32
3 Mrs. 28
4 Mrs. 16
5 Mr. 34
```

In the example above, it keeps all of the values unchanged except for the one `NaN`

value on the second row which it fills by calculating the mean for the group, i.e. the mean value of all rows where the `Title`

is `Mr.`

.