Patrick Allo Patrick Allo - 4 months ago 41
Python Question

Merge rows in DataFrame by removing nan's after groupby

Give a DataFrame like the following:

import numpy as np
import pandas as pd
from pandas import DataFrame

idx = pd.MultiIndex.from_product([["Project 1", "Project 2"], range(1,3)],
names=['Project', 'Ord'])
df = DataFrame({'a': ["foo", np.nan, np.nan, "bar"],
'b': [np.nan, "one", "two", np.nan]},


a b
Project Ord
Project 1 1 foo NaN
2 NaN one
Project 2 1 NaN two
2 bar NaN

I would like to merge the rows with the same outer index (note that in each case there is only one non nan-value).

The current solution I have involves two groupby-operations:

df.index = df.index.droplevel(1)

and gives me the intended result:

a b
Project 1 foo one
Project 2 bar two

Having to use two groupie-operations seems excessive, since all I need is an aggregation-function that returns the single non-nan value from a list. However, I cannot think of a way to use dropna as an aggregation-function.


the last method on groupby grabs the the last valid value. first would accomplish the same thing in this case.


enter image description here