Linguistics Student Linguistics Student - 1 year ago 110
Python Question

Tweepy Location Filter Does Not Work


I am trying to collect 50 tweets that originate from a specified geographic region. My code below will print 50 tweets, but a lot of them have "NONE" for coordinates. Does this mean that these tweet with "NONE" is not generated from the specified area? Can you explain what is happening here? And how to collect 50 tweets from this specified geographic area? Thanks in advance.

# Import Tweepy, sys, sleep,
import json
except ImportError:
import simplejson as json
import tweepy, sys
from time import sleep
from credentials import *

# Access and authorize our Twitter credentials from
auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_token_secret)
api = tweepy.API(auth)

# Assign coordinates to the variable
box = [-74.0,40.73,-73.0,41.73]

#override tweepy.StreamListener to add logic to on_status
class MyStreamListener(tweepy.StreamListener):
def __init__(self, api=None):
super(MyStreamListener, self).__init__()
self.counter = 0

def on_status(self, status):
record = {'Text': status.text, 'Coordinates': status.coordinates, 'Created At': status.created_at}
self.counter += 1
if self.counter <= 50:
print record
return True
return False

def on_error(self, status_code):
if status_code == 420:
#returning False in on_data disconnects the stream
return False

myStreamListener = MyStreamListener()
myStream = tweepy.Stream(api.auth, listener=myStreamListener)
myStream.filter(locations=box, async=True)
print myStream

Here is the result:

{'Text': u"What?...", 'Created At': datetime.datetime(2017, 3, 12, 2, 55, 6), 'Coordinates': {u'type': u'Point', u'coordinates': [-74.
1234567, 40.1234567]}}
{'Text': u'WHEN?...', 'Created A
t': datetime.datetime(2017, 3, 12, 2, 55, 8), 'Coordinates': None}
{'Text': u'Wooo...', 'Created At': datetime.datetime(2017, 3, 12, 2, 55, 9), 'Coordinates': None}
{'Text': u'Man...', 'Created At': datetime.datetime(2017, 3, 12, 2, 55, 9), 'Coordina
tes': None}
{'Text': u'The...', 'Created At': datetime.datetime(201
7, 3, 12, 2, 55, 10), 'Coordinates': None}

Answer Source

From the docs:

Only geolocated Tweets falling within the requested bounding boxes will be included—unlike the Search API, the user’s location field is not used to filter Tweets.

That guarentees that the tweets in the response are from the bounding box provided.

How does the bounding box filter work?

The streaming API uses the following heuristic to determine whether a given Tweet falls within a bounding box:

  • If the coordinates field is populated, the values there will be tested against the bounding box. Note that this field uses geoJSON order (longitude, latitude).

  • If coordinates is empty but place is populated,the region defined in place is checked for intersection against the locations bounding box. Any overlap will match. If none of the rules listed above match, the Tweet does not match the location query.

Again, this implies that the coordinates field can be None but that the bbox filter is guaranteed to return tweets from the bounding box region


Edit: place is a field in the response similar to coordinates.

Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download