Vignesh Sureshbabu Kishore Vignesh Sureshbabu Kishore - 3 months ago 9
Python Question

string cleaning in python

Input:

" The Elephant's 4 cats. "


Expected Output:

the elephants 4 cats


Code:

import re

temp1 = re.sub('\W+',' ', str).strip()
output = temp2.lower()


My output:

the elephant s 4 cats


I still have the extra space between elephant and 's'. One more problem is I am not able to remove '_' (underscore). Where am I going wrong, any suggestions would be helpful.

Answer

try:

temp1 = re.sub(r'[^\w\s_]+', '', str).strip()

Basically, your original \W+ means "non-word characters", which matches spaces, quotes, and periods. So it replaces them all with a "space"...which means the apostrophe gains a space.

By specifically matching non-word-non-space-non-underscore characters, you'll probably get a better replacement.