Jack-Jack Jack-Jack - 3 years ago 169
Python Question

My function remove_stopwords. Removes every stopwords in a word

So im trying to remove all the stop-words from a text file. The problem is, it is removing the stopwords each in every word.

def remove_stopwords(input):
stop_words = set(stopwords.words('english'))
filtered_words = [word for word in input if not word in stop_words]
return filtered_words

Sample Input: Damage from Typhoon Lando soars to P6B
Output: Dge fr Tphn Ln r P6B

Answer Source

Tokenize your str input before removing stop words.

from nltk.corpus import stopwords
from nltk import word_tokenize

stoplist  = set(stopwords.words('english'))

def remove_stopwords(text):
    return [word for word in word_tokenize(text) if not word in stoplist]
Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download