Patrick Allo Patrick Allo - 29 days ago 8
Python Question

Merge rows in DataFrame by removing nan's after groupby

Give a DataFrame like the following:

import numpy as np
import pandas as pd
from pandas import DataFrame

idx = pd.MultiIndex.from_product([["Project 1", "Project 2"], range(1,3)],
names=['Project', 'Ord'])
df = DataFrame({'a': ["foo", np.nan, np.nan, "bar"],
'b': [np.nan, "one", "two", np.nan]},
index=idx)

Out:

a b
Project Ord
Project 1 1 foo NaN
2 NaN one
Project 2 1 NaN two
2 bar NaN


I would like to merge the rows with the same outer index (note that in each case there is only one non nan-value).

The current solution I have involves two groupby-operations:

df.index = df.index.droplevel(1)
df.groupby(df.index).ffill().groupby(df.index).last()


and gives me the intended result:

Out:
a b
Project
Project 1 foo one
Project 2 bar two


Having to use two groupie-operations seems excessive, since all I need is an aggregation-function that returns the single non-nan value from a list. However, I cannot think of a way to use dropna as an aggregation-function.

Answer

the last method on groupby grabs the the last valid value. first would accomplish the same thing in this case.

df.groupby(level='Project').last()

enter image description here

Comments