luisfer - 1 year ago 175
Python Question

# How to use a user function to fillna() in pandas

This is a fragment of the dataframe I have:

``````Title | Age
------+--------
Mr.   | 30
Mr.   | NaN
Mr.   | 32
Mrs.  | 28
Mrs.  | 16
Mr.   | 34
Mrs.  | NaN
``````

Edit: I added the last row, to clarify the question

I want to impute the NaNs (second and last row), for the second row, it should use the mean of the other "Mr." in the dataframe, so in this case, should be 32, in the last row it should use the mean of the other "Mrs.", so should be 22

To calculate the mean is as easy as doing

``````value = df.loc[df["Title"] == "Mr."]["Age"].mean()
``````

So I wrote a function called agefun:

``````def agefun(df, t):
return df.loc[df["Title"] == t]["Age"].mean()
``````

And it works, now, how can I use this function with the fillna() function? I'd like something like:

``````df['Age'].fillna(agefun(df, this_row_title))
``````

But of course it doesn't work, I don't know how to tell the function I like the value corresponding to the Title in that specific row.

How can this be performed?

Transform keeps the same shape as the original series in the dataframe.

``````df['Age'] = df.groupby('Title').transform(lambda group: group.fillna(group.mean()))

>>> df
Title  Age
0   Mr.   30
1   Mr.   32  # (30 + 32 + 34) / 3 = 32
2   Mr.   32
3  Mrs.   28
4  Mrs.   16
5   Mr.   34
``````

In the example above, it keeps all of the values unchanged except for the one `NaN` value on the second row which it fills by calculating the mean for the group, i.e. the mean value of all rows where the `Title` is `Mr.`.

Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download