Onéira Onéira - 1 month ago 11
Python Question

Joining string of a columns over several index while keeping other colums

Here is an example data set:

>>> df1 = pandas.DataFrame({
"Name": ["Alice", "Marie", "Smith", "Mallory", "Bob", "Doe"],
"City": ["Seattle", None, None, "Portland", None, None],
"Age": [24, None, None, 26, None, None],
"Group": [1, 1, 1, 2, 2, 2]})

>>> df1
Age City Group Name
0 24.0 Seattle 1 Alice
1 NaN None 1 Marie
2 NaN None 1 Smith
3 26.0 Portland 2 Mallory
4 NaN None 2 Bob
5 NaN None 2 Doe


I would like to merge the Name column for all index of the same group while keeping the City and the Age wanting someting like:

>>> df1_summarised
Age City Group Name
0 24.0 Seattle 1 Alice Marie Smith
1 26.0 Portland 2 Mallory Bob Doe


I know those 2 columns (Age, City) will be NaN/None after the first index of a given group from the structure of my starting data.

I have tried the following:

>>> print(df1.groupby('Group')['Name'].apply(' '.join))
Group
1 Alice Marie Smith
2 Mallory Bob Doe
Name: Name, dtype: object


But I would like to keep the Age and City columns...

Answer

using dropna and assign with groupby

df1.dropna(subset=['Age', 'City']) \
   .assign(Name=df1.groupby('Group').Name.apply(' '.join).values)

enter image description here


timing
per request

enter image description here

Comments