E-ebola virus E-ebola virus - 23 days ago 13
Python Question

How get sum of values in another dataframe based column value in first dataframe?

I have a data frame

df = pd.DataFrame({'Color': 'Red Red Blue'.split(), 'Value': [100, 150, 50]})
>>> df
Color Value
0 Red 100
1 Red 150
2 Blue 50


I have second data frame dfmain

dfmain = pd.DataFrame({'Color': ["Red","Blue","Yellow"]})
>>> dfmain
Color
0 Red
1 Blue
2 Yellow


i want to get result data frame with sum of each colors
my expected result is

>>> result
Color sum
0 Red 250
1 Blue 50
2 Yellow 0


Now i am using loop. its getting slow when run for large data set . I would like to get
typical pandas(or numpy) solution for this

Answer

You can use groupby with aggregating sum with reindex:

df = df.groupby('Color')['Value'].sum().reindex(dfmain.Color, fill_value=0).reset_index()
print (df)

    Color  Value
0     Red    250
1    Blue     50
2  Yellow      0