medev21 medev21 - 3 months ago 28
Python Question

Pandas - how to remove spaces in each column in a dataframe?

I'm trying to remove spaces, apostrophes, and double quote in each column data using this for loop

for c in data.columns:
data[c] = data[c].str.strip().replace(',', '').replace('\'', '').replace('\"', '').strip()


but I keep getting this error:

AttributeError: 'Series' object has no attribute 'strip'


data is the data frame and was obtained from an excel file

xl = pd.ExcelFile('test.xlsx');
data = xl.parse(sheetname='Sheet1')


Am I missing something? I added the
str
but that didn't help. Is there a better way to do this.

I don't want to use the column labels, like so
data['column label']
, because the text can be different. I would like to iterate each column and remove the characters mentioned above.

incoming data:

id city country
1 Ontario Canada
2 Calgary ' Canada'
3 'Vancouver Canada


desired output:

id city country
1 Ontario Canada
2 Calgary Canada
3 Vancouver Canada

Answer

UPDATE: using your sample DF:

In [80]: df
Out[80]:
   id         city   country
0   1      Ontario    Canada
1   2    Calgary '   Canada'
2   3  'Vancouver    Canada

In [81]: df.replace(r'[,\"\']','', regex=True).replace(r'\s*([^\s]+)\s*', r'\1', regex=True)
Out[81]:
   id       city country
0   1    Ontario  Canada
1   2    Calgary  Canada
2   3  Vancouver  Canada

OLD answer:

you can use DataFrame.replace() method:

In [75]: df.to_dict('r')
Out[75]:
[{'a': ' x,y ', 'b': 'a"b"c', 'c': 'zzz'},
 {'a': "x'y'z", 'b': 'zzz', 'c': '  ,s,,'}]


In [76]: df
Out[76]:
       a      b       c
0   x,y   a"b"c     zzz
1  x'y'z    zzz    ,s,,

In [77]: df.replace(r'[,\"\']','', regex=True).replace(r'\s*([^\s]+)\s*', r'\1', regex=True)
Out[77]:
     a    b    c
0   xy  abc  zzz
1  xyz  zzz    s

r'\1' - is a numbered capturing RegEx group

Comments