Sergey Ivanov Sergey Ivanov - 22 days ago 6
Python Question

Assign groupby-apply result to parent dataframe

I have the following data frame:

df = pd.DataFrame({'A' : ['foo', 'bar', 'foo', 'bar','foo', 'bar', 'foo', 'foo'],
'B' : ['one', 'one', 'two', 'three', 'two', 'two', 'one', 'three'],
'C' : np.random.randn(8),
'D' : np.random.randn(8)})

A B C D
0 foo one 0.478183 -1.267588
1 bar one 0.555985 -2.143590
2 foo two -1.592865 1.251546
3 bar three 0.174138 -0.708198
4 foo two 0.302215 -0.219041
5 bar two -0.034550 -0.965414
6 foo one 1.310828 -0.388601
7 foo three 0.357659 -1.610443


I'm trying to add another column which will be a normalized version of column C over partition by A:

normed = df.groupby('A').apply(lambda x: (x['C']-min(x['C']))/(max(x['C'])-min(x['C'])))

A
bar 1 0.000000
3 0.033396
5 1.000000
foo 0 1.000000
2 0.413716
4 0.000000
6 0.441061
7 0.357787


Finally I want to join this result back to df (using advice from the similar question):

df.join(normed, on='A', rsuffix='_normed')


However, I get an error:


ValueError: len(left_on) must equal the number of levels in the index
of "right"


How can I add
normed
result back to dataframe
df
?

Answer

You get this error because you have a MultiIndex with length 2 in the first level. The second level is the original index.

normed.index

Out[35]:

MultiIndex(levels=[['bar', 'foo'], [0, 1, 2, 3, 4, 5, 6, 7]],
           labels=[[0, 0, 0, 1, 1, 1, 1, 1], [1, 3, 5, 0, 2, 4, 6, 7]],
           names=['A', None])

You probably want to join on the Original index, so you must drop the first level of the new index

normed.index = normed.index.droplevel()

before joining:

df.join(normed, rsuffix='_normed')