Dreamer Dreamer - 8 months ago 49
Python Question

Update missing values in a column using pandas

I have a dataframe df with two of the columns being 'city' and 'zip_code':

df = pd.DataFrame({'city': ['Cambridge','Washington','Miami','Cambridge','Miami',
'Washington'], 'zip_code': ['12345','67891','23457','','','']})

As shown above, a particular city contains zip code in one of the rows, but the zip_code is missing for the same city in some other row. I want to fill those missing values based on the zip_code values of that city in some other row. Basically, wherever there is a missing zip_code, it checks zip_code for that city in other rows, and if found, fills the value for zip_code.If not found, fills 'NA'.

How do I accomplish this task using pandas?


You can go for:

import numpy as np

df['zip_code'] = df.replace(r'', np.nan).groupby('city')['zip_code'].fillna(method='ffill').fillna(method='bfill')

>>> df
         city zip_code
0   Cambridge    12345
1  Washington    67891
2       Miami    23457
3   Cambridge    12345
4       Miami    23457
5  Washington    67891