So the problem I'm having is that I'm iterating over a pretty large csv file. startDate and endDate are input given to me by the user and I need to only search in that range.
Although, when I run the program up to that point, it takes a long time to just spit back out "set()" at me. I've pointed where I'm having trouble at in the code
looking for suggestions and possibly sample code, thank you all in advance!
def compare(word1, word2, startDate, endDate):
with open('all_words.csv') as allWords:
readWords = csv.reader(allWords, delimiter=',')
year = set()
for row in readWords:
if row in range(int(startDate), int(endDate)): #< Having trouble here
if row == word1:
The reason your test isn't finding any years is that the expression:
row in range(int(startDate), int(endDate))
is checking to see if a string value appears in a list of integers. If you test:
"1970" in range(1960, 1980)
you will see that it returns False. You need to write:
int(row) in range(int(startDate), int(endDate))
However, this is still quite inefficient. It is checking if the value
int(row) occurs anywhere in the sequence
[int(startDate), int(startDate)+1, ..., int(endDate)], and it's doing it by linear search. Much faster will be:
if int(startDate) <= int(row) < int(endDate):
Note that your code above was written to exclude
endDate for the list of possible dates (because range excludes its second argument), and I've done the same above.
Edit: Actually, I guess I should point out that it's only Python 2 where an expression like
500000 in range(1, 1000000) is inefficient. In Python 3 (or in Python 2 with
xrange in place of
range), it's fast.