piRSquared piRSquared - 6 days ago 5
Python Question

why would assigning to dataframe with loc and and a slice be different than with a single column?

I'm trying to update a column from float to int. consider

df
in the following two scenarios:

df = pd.DataFrame(dict(A=[1.1, 2], B=[1., 2]))
print(df.A.dtype)

df.loc[:, ['A']] = df[['A']].astype(int)
print(df.A.dtype)
df


enter image description here

The dtype failed to update to
int
but the value in
'A'
is definitely truncated.




However,

df = pd.DataFrame(dict(A=[1.1, 2], B=[1., 2]))
print(df.A.dtype)

df.loc[:, 'A'] = df.A.astype(int)
print(df.A.dtype)
df


enter image description here

works just fine.

Is there a justification for these behaving differently?

Answer

Right from the documentation:

Note When trying to convert a subset of columns to a specified type using astype() and loc(), upcasting occurs. loc() tries to fit in what we are assigning to the current dtypes, while [] will overwrite them taking the dtype from the right hand side. Therefore the following piece of code produces the unintended result.

Comments