I have a file that is a list of words- one word on each line- filterlist.txt.
The other file is a giant string of text- text.txt.
I want to find all the instances of the words from filterlist.txt in text.txt and delete them.
Here is what i have so far:
text = open('ttext.txt').read().split()
filter_words = open('filterlist.txt').readline()
for line in text:
for word in filter_words:
if word == filter_words:
Store the filter words in a set, iterate over the words from the line in
ttext.txt, and only keep the words that are not in the set of filter words.
with open('ttext.txt') as text, open('filterlist.txt') as filter_words: st = set(map(str.rstrip,filter_words)) txt = next(text).split() out = [word for word in txt if word not in st]
If you want to ignore case and remove punctuation you will need to call lower on each line and strip the punctuation:
from string import punctuation with open('ttext.txt') as text, open('filterlist.txt') as filter_words: st = set(word.lower().rstrip(punctuation+"\n") for word in filter_words) txt = next(text).lower().split() out = [word for word in txt if word not in st]
If you had multiple lines in
(word for line in text for word in line.split()) would be a more memory efficient approach.