Babyburger Babyburger - 11 days ago 5
Python Question

Joining rows based on value conditions

Assume a Pandas DataFrame like the following:

Year Bread Amount
-----------------------------
2008S1 white 50
2008S1 brown 30
2008S2 white 60
2008S2 brown 90
2009S1 brown 100
2009S1 golden 2000
2009S2 brown 20
2009S2 golden 1000


I wish to join the relevant year/bread combo together such that the amount is added together. It's okay to assume that every year+bread combo is unique. So the result would become:

Year Bread Amount
-----------------------------
2008 white 110
2008 brown 120
2009 brown 120
2009 golden 3000


What's a good way to do this? I thought of using boolean indexing to splice the rows containing yearS2 and bread (deleting the rows and extracting the values). Then look for the relevant rows (yearS1 and same bread) again through boolean indexing, to perform the addition. This sounds like a lot of work which I assume can be handled more elegantly.

Answer

You can use groupby. Grouping will be done on the first 4 characters of the Year column and the Bread column as follows:

df.groupby([df['Year'].str[:4], 'Bread']).sum()
Out: 
             Amount
Year Bread         
2008 brown      120
     white      110
2009 brown      120
     golden    3000

Or as regular columns:

df.groupby([df['Year'].str[:4], 'Bread'], as_index=False).sum()
Out: 
    Bread  Amount
0   brown     120
1   white     110
2   brown     120
3  golden    3000