Stackimus Prime Stackimus Prime - 3 months ago 6
Python Question

Does particular string match strings in text file

I have a text file containing many words (single word on each line). I have to read in each word, modify the words, and then check if the modified word matches any of the words in the file. I am having trouble with the last part (it is the hasMatch method in my code). It sounds simple enough and I know what I should do, but whatever I try does not work.

#read in textfile
myFile = open('good_words.txt')


#function to remove first and last character in string, and reverse string
def modifyString(str):
rmFirstLast = str[1:len(str)-2] #slicing first and last char
reverseStr = rmFirstLast[::-1] #reverse string
return reverseStr

#go through list of words to determine if any string match modified string
def hasMatch(modifiedStr):
for line in myFile:
if line == modifiedStr:
print(modifiedStr + " found")
else:
print(modifiedStr + "not found")

for line in myFile:
word = str(line) #save string in line to a variable

#only modify strings that are greater than length 3
if len(word) >= 4:
#global modifiedStr #make variable global
modifiedStr = modifyString(word) #do string modification
hasMatch(modifiedStr)

myFile.close()

Answer

Several problems here

  1. you have to strip the lines or you get linefeed/CR chars that fail the match
  2. you have to read the file once and for all or the file iterator runs out after the first time
  3. the speed is bad: sped up for the search using a set instead of a list
  4. the slicing is overly complicated and wrong: str[1:-1] does it (thanks to those who commented my answer)
  5. The whole code is really to long & complex. I summed it up in a few lines.

code:

#read in textfile
myFile = open('good_words.txt')
# make a set (faster search), remove linefeeds
lines = set(x.strip() for x in myFile)
myFile.close()

# iterate on the lines
for word in lines:
    #only consider strings that are greater than length 3
    if len(word) >= 4:
        modifiedStr = word[1:-1][::-1] #do string modification
        if modifiedStr in lines:
            print(modifiedStr + " found (was "+word+")")
        else:
            print(modifiedStr + " not found")

I tested the program on a list of common english words and I got those matches:

so found (was most)
or found (was from)
no found (was long)
on found (was know)
to found (was both)

Edit: another version which drops the set and uses bisect on the sorted list to avoid hashing/hash collisions.

import os,bisect

#read in textfile
myFile = open("good_words.txt"))
lines = sorted(x.strip() for x in myFile) # make a sorted list, remove linefeeds
myFile.close()

result=[]
for word in lines:

    #only modify strings that are greater than length 3
    if len(word) >= 4:
        modifiedStr = word[1:-1][::-1] #do string modification
        # search where to insert the modified word
        i=bisect.bisect_left(lines,modifiedStr)
        # if can be inserted and word is actually at this position: found
        if i<len(lines) and lines[i]==modifiedStr:
            print(modifiedStr + " found (was "+word+")")
        else:
            print(modifiedStr + " not found")