Vignesh Sureshbabu Kishore Vignesh Sureshbabu Kishore - 10 months ago 59
Python Question

string cleaning in python


" The Elephant's 4 cats. "

Expected Output:

the elephants 4 cats


import re

temp1 = re.sub('\W+',' ', str).strip()
output = temp2.lower()

My output:

the elephant s 4 cats

I still have the extra space between elephant and 's'. One more problem is I am not able to remove '_' (underscore). Where am I going wrong, any suggestions would be helpful.



temp1 = re.sub(r'[^\w\s_]+', '', str).strip()

Basically, your original \W+ means "non-word characters", which matches spaces, quotes, and periods. So it replaces them all with a "space"...which means the apostrophe gains a space.

By specifically matching non-word-non-space-non-underscore characters, you'll probably get a better replacement.