Observer Observer - 11 months ago 36
Python Question

If a string word contains particular characters and remove the word that contains the characters

Suppose I have a data as follows,


This is a sentence
Donald Trump
Machine Learning
Python is good

I want to search for pattern of characters and if we find one, need to remove that word which contains the characters.

Suppose I want to remove words with "enc" , "ood" and "ump", the output should be,


This is a
Machine Learning
Python is

I tried the following where I used re.sub,

re.sub("enc", "", y)

But this is giving output like,
This is a sente
. I am not sure how to remove the entire word.

Can anybody help me in doing this is python? I want to find the efficient way to do this because, I want to run this for nearly 1 Billion records using pyspark. Can anybody help me in doing this?


Answer Source

Add iterations before and after the identifier:

re.sub(r'\w*enc\w*', '', y)

That would replace with blank all the alphanumeric characters along with the specified string (i.e. the word it is contains within).