hans-t hans-t - 4 months ago 9
Python Question

How to do assignment in Pandas without warning?

I'm trying to port this code in R to Python using Pandas.

This is my R code (assume data is a

data.frame
):

transform <- function(data) {
baseValue <- data$baseValue

na.base.value <- is.na(baseValue)
baseValue[na.base.value] <- 1

zero.base.value <- baseValue == 0
baseValue[zero.base.value] <- 1

data$adjustedBaseValue <- data$baseRatio * baseValue

baseValue[na.base.value] <- -1
baseValue[zero.base.value] <- 0
data$baseValue <- baseValue

return(data)
}


This is my attempt to port the R code in Python (assume data is
pandas.DataFrame
):

import pandas as pd

def transform(data):
base_value = data['baseValue']

na_base_value = base_value.isnull()
base_value.loc[na_base_value] = 1

zero_base_value = base_value == 0
base_value.loc[zero_base_value] = 1

data['adjustedBaseValue'] = data['baseRatio'] * base_value

base_value.loc[na_base_value] = -1
base_value.loc[zero_base_value] = 0

return data


But then I got this warning:


A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
self._setitem_with_indexer(indexer, value)


I have read through and don't understand how to fix it. What should I do to fix the code so that there is no more warning? I don't want to suppress the warning though.

Answer

If you want to modify the same object that was passed to the function, then this should work so long as what's passed in as data isn't already a view of another dataframe.

def transform(data):
    base_value = data['baseValue']

    na_base_value = base_value.isnull()
    data.loc[na_base_value, 'baseValue'] = 1

    zero_base_value = base_value == 0
    data.loc[zero_base_value, 'baseValue'] = 1

    data['adjustedBaseValue'] = data['baseRatio'] * base_value

    data.loc[na_base_value, 'baseValue'] = -1
    data.loc[zero_base_value, 'baseValue'] = 0

    return data

If you want to work with a copy and return that manipulated copied data then this is your answer.

def transform(data):
    data = data.copy()

    base_value = data['baseValue'].copy()

    na_base_value = base_value.isnull()
    base_value.loc[na_base_value] = 1

    zero_base_value = base_value == 0
    base_value.loc[zero_base_value] = 1

    data['adjustedBaseValue'] = data['baseValue'] * base_value

    base_value.loc[na_base_value] = -1
    base_value.loc[zero_base_value] = 0

    return data