I have defined a function that returns me the sentences containing specified word from an excel file having a 'text' column.
And with the help of @Julien Marrec I redefined the function so that I could pass multiple words as argument as below
words = ['word1','word2','word3'.......]
df['text'].apply(lambda text: [sent for sent in sent_tokenize(text)
if any(True for w in word_tokenize(sent)
if w.lower() in searched_words)])
If you don't care about word boundaries, you can skip word tokenisation and just match with a regular expression.
However, this might give you a lot of matches that you didn't expect. For example, the search terms "tin" and "nation" will both match in the word "procrastination". If that is what you want, you can do the following:
import re fsa = re.compile('|'.join(re.escape(w.lower()) for w in searched_words)) df['text'].apply(lambda text: [sent for sent in sent_tokenize(text) if fsa.search(sent)])
re.compile() expression creates a regex pattern object, which consists simply of a set of alternatives.
This allows you to scan through the complete sentence, looking out for all of the searched words at the same time.