piRSquared piRSquared - 2 months ago 10
Python Question

What is the benefit of a dataframe view or copy

I've seen many questions about the infamous

SettingWithCopy
warning. I've even ventured to answer a few of them. Recently, I was putting together an answer that involved this topic and I wanted to present the benefit of a dataframe view. I failed to produce a tangible demonstration of why it is a good idea to create a dataframe view or whatever the thing is that generates
SettingWithCopy


Consider
df


df = pd.DataFrame([[1, 2], [3, 4]], list('ab'), list('AB'))
df

A B
x 1 2
y 3 4


and
dfv
which is a copy of
df


dfv = df[['A']]





print(dfv.is_copy)

<weakref at 0000000010916E08; to 'DataFrame' at 000000000EBF95C0>





print(bool(dfv.is_copy))

True





I can generate the
SettingWithCopy


dfv.iloc[0, 0] = 0


enter image description here




However,
dfv
has changed

print(dfv)

A
a 0
b 3


df
has not

print(df)

A B
x 1 2
y 3 4





and
dfv
is still a copy

print(bool(dfv.is_copy))

True





If I change
df


df.iloc[0, 0] = 7
print(df)

A B
x 7 2
y 3 4





But
dfv
has not changed. However, I can reference
df
from
dfv


print(dfv.is_copy())

A B
x 7 2
y 3 4





Question



If
dfv
maintains it's own data (meaning, it doesn't actually save memory) and it assigns values via assignment operations despite warnings, then why do we bother saving references in the first place and generate
SettingWithCopyWarning
at all?

What is the tangible benefit?

Answer

There is a lot of existing discussion on this, see here for instance, including the attempted PRs. It's also worth noting that true copy-on-write for views is being considered as part of the "pandas 2.0" refactor, see here.

The reason the reference is being maintained in your example is specifically because it's not a view, so that if someone tries this, they'll get a warning.

df[['A']].iloc[0, 0] = 1
Comments