Observer Observer - 3 months ago 6
Python Question

If a string word contains particular characters and remove the word that contains the characters

Suppose I have a data as follows,

data['sentences']

This is a sentence
Donald Trump
Machine Learning
Python is good


I want to search for pattern of characters and if we find one, need to remove that word which contains the characters.

Suppose I want to remove words with "enc" , "ood" and "ump", the output should be,

data['sentences']

This is a
Donald
Machine Learning
Python is


I tried the following where I used re.sub,

re.sub("enc", "", y)


But this is giving output like,
This is a sente
. I am not sure how to remove the entire word.

Can anybody help me in doing this is python? I want to find the efficient way to do this because, I want to run this for nearly 1 Billion records using pyspark. Can anybody help me in doing this?

Thanks

Answer

Add iterations before and after the identifier:

re.sub(r'\w*enc\w*', '', y)

That would replace with blank all the alphanumeric characters along with the specified string (i.e. the word it is contains within).