RF_PY RF_PY - 22 days ago 11
Python Question

How to replace empty cells with 0 and change strings to integers where possible in a pandas dataframe?

I have a dataframe with a 3000+ columns. Many cells in the dataframe are empty strings (' '). Also, I have a lot of numerical values that are are strings but should actually be integers. I wrote two functions to fill all the empty cells with a 0 and where possible change the value to an integer, but when I run them nothing changes to my dataframe. The functions:

def recode_empty_cells(dataframe, list_of_columns):

for column in list_of_columns:
dataframe[column].replace(r'\s+', np.nan, regex=True)
dataframe[column].fillna(0)

return dataframe

def change_string_to_int(dataframe, list_of_columns):

dataframe = recode_empty_cells(dataframe, list_of_columns)

for column in list_of_columns:
try:
dataframe[column] = dataframe[column].astype(int)
except ValueError:
pass

return dataframe


Note: I'm using a try/except statement because some columns contain text in some form. Thanks in advance for your help.

Edit:

Thanks to your help I got the first part working. All the empty cells have 0s now. This is my code at this moment:

def recode_empty_cells(dataframe, list_of_columns):

for column in list_of_columns:
dataframe[column] = dataframe[column].replace(r'\s+', 0, regex=True)

return dataframe

def change_string_to_int(dataframe, list_of_columns):

dataframe = recode_empty_cells(dataframe, list_of_columns)

for column in list_of_columns:
try:
dataframe[column] = dataframe[column].astype(int)
except ValueError:
pass

return dataframe


However, this gives me the following error:
OverflowError: Python int too large to convert to C long

Answer

you are not saving your change in your function:

def recode_empty_cells(dataframe, list_of_columns):

    for column in list_of_columns:
      dataframe[column] = dataframe[column].replace(r'\s+', np.nan, regex=True)
      dataframe[column] = dataframe[column].fillna(0)

    return dataframe
Comments