hansolo hansolo - 4 months ago 28
Python Question

Python Print Distinct Values

Using Tweepy in Python 2.7 to store results of a search query into a CSV file. I am trying to figure out how I can print only the number of unique tweet.ids from my result set. I know that (len(list)) works but obviously I haven't initialized a list here. I am new to python programming so the solution may be obvious. Any help is appreciated.

for tweet in tweepy.Cursor(api.search,
q="Wookie",
#since="2014-02-14",
#until="2014-02-15",
lang="en").items(5000000):
#Write a row to the csv file
csvWriter.writerow([tweet.created_at, tweet.text.encode('utf-8'), tweet.favorite_count, tweet.user.name, tweet.id])
print "...%s tweets downloaded so far" % (len(tweet.id))
csvFile.close()

Answer

You could use a set to keep track of the unique ids you've seen so far, and then print that:

ids = set()
for tweet in tweepy.Cursor(api.search, 
                q="Wookie", 
                #since="2014-02-14", 
                #until="2014-02-15", 
                lang="en").items(5000000):
    #Write a row to the csv file
    csvWriter.writerow([tweet.created_at, tweet.text.encode('utf-8'), tweet.favorite_count, tweet.user.name, tweet.id])
    print "...%s tweets downloaded so far" % (len(tweet.id))
    ids.add(tweet.id) # add new id
    print "number of unique ids seen so far: {}".format(len(ids))
csvFile.close()

Sets are like lists, except that they only keep unique elements. It won't add duplicates to the set.