Jeff Jeff - 3 months ago 18
Python Question

Python pandas grouping issue

Am i doing something wrong here or is there a bug here.

df2 is a copy/slice of df1. But the minute i attempt to group it by column A and get the last value of the grouping from column C, creating a new column 'NewMisteryColumn', df1 also gets a new 'NewMisteryColumn'

The end result in df2 is correct. I also have different ways on how i can do this, i am not looking for a different method, just wondering on whether i have stumbled upon a bug.

My question is, isn't df1 separate from df2, why is df1 also getting the same column?

df1 = pd.DataFrame({'A':['some value','some value', 'another value'],
'B':['rthyuyu','truyruyru', '56564'],
'C':['tryrhyu','tryhyteru', '54676']})



df2 = df1

df2['NewMisteryColumn'] = df2.groupby(['A'])['C'].tail(1)

Answer

The problem is that df2 is just another reference to the DataFrame.

df2 = df1
df3 = df1.copy()

df1 is df2  # True
df1 is df3  # False

You can also verify the ids...

id(df1)
id(df2)  # Same as id(df1)
id(df3)  # Different!