James C. James C. - 4 months ago 16
Python Question

Removing certain rows in Pandas Dataframe by string format

I have a Pandas dataframe with a column called Zip Code. The column is an object data type and some rows are not in proper zip code format. I would like to remove rows that do not contain ##### format zipcode.

Subscriber Type Zip Code
0 Subscriber 94040
1 Customer 11231
2 Customer 11231
3 Customer 32
4 Customer nil


What would be an easy way to do so?
Is there a way to compare format and the records something like this? df.drop(df['Zip Code'] != #####)

Answer

try this:

In [23]: df = df[df['Zip Code'].str.contains(r'\d{5}')]

In [24]: df
Out[24]:
  Subscriber Type Zip Code
0      Subscriber    94040
1        Customer    11231
2        Customer    11231

Explanation:

In [22]: df['Zip Code'].str.contains(r'\d{5}')
Out[22]:
0     True
1     True
2     True
3    False
4    False
Name: Zip Code, dtype: bool