mkln mkln - 5 months ago 85
Python Question

Is there an "ungroup by" operation opposite to .groupby in pandas?

Suppose we start from this simple table, stored in a pandas dataframe:

name age family
0 john 1 1
1 jason 36 1
2 jane 32 1
3 jack 26 2
4 james 30 2

Then I do

group_df = df.groupby('family')
group_df = group_df.aggregate({'name': name_join, 'age':})

is a simple aggregating function for the names:

def name_join(list_names, concat='-'):
return concat.join(list_names)

the output is:

age name
1 23 john-jason-jane
2 28 jack-james

Now the question.

Is there a quick, efficient way to get to the following from the aggregated table?

name age family
0 john 23 1
1 jason 23 1
2 jane 23 1
3 jack 28 2
4 james 28 2

(Note: numbers are just examples, I don't care for the information I am losing after averaging in this specific example)

The way I thought I could do it does not look too efficient:

  1. create empty dataframe

  2. from every line in
    , separate the names

  3. return a dataframe with as many rows as there are names in the starting row

  4. append the output to the empty dataframe


It may not be helpful to think of the operation as the "opposite" of groupby.

You are splitting a string in to pieces, and maintaining each piece's association with 'family'. This old answer of mine does the job.

Just set 'family' as the index column first, refer to the link above, and then reset_index() at the end to get your desired result.