Linguistics Student Linguistics Student - 9 months ago 67
Python Question

Tweepy Location Filter Does Not Work

PROBLEM SOLVED, SEE SOLUTION IN THE ACCEPTED POST

I am trying to collect 50 tweets that originate from a specified geographic region. My code below will print 50 tweets, but a lot of them have "NONE" for coordinates. Does this mean that these tweet with "NONE" is not generated from the specified area? Can you explain what is happening here? And how to collect 50 tweets from this specified geographic area? Thanks in advance.

# Import Tweepy, sys, sleep, credentials.py
try:
import json
except ImportError:
import simplejson as json
import tweepy, sys
from time import sleep
from credentials import *

# Access and authorize our Twitter credentials from credentials.py
auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_token_secret)
api = tweepy.API(auth)

# Assign coordinates to the variable
box = [-74.0,40.73,-73.0,41.73]

#override tweepy.StreamListener to add logic to on_status
class MyStreamListener(tweepy.StreamListener):
def __init__(self, api=None):
super(MyStreamListener, self).__init__()
self.counter = 0

def on_status(self, status):
record = {'Text': status.text, 'Coordinates': status.coordinates, 'Created At': status.created_at}
self.counter += 1
if self.counter <= 50:
print record
return True
else:
return False

def on_error(self, status_code):
if status_code == 420:
#returning False in on_data disconnects the stream
return False

myStreamListener = MyStreamListener()
myStream = tweepy.Stream(api.auth, listener=myStreamListener)
myStream.filter(locations=box, async=True)
print myStream


Here is the result:

{'Text': u"What?...", 'Created At': datetime.datetime(2017, 3, 12, 2, 55, 6), 'Coordinates': {u'type': u'Point', u'coordinates': [-74.
1234567, 40.1234567]}}
{'Text': u'WHEN?...', 'Created A
t': datetime.datetime(2017, 3, 12, 2, 55, 8), 'Coordinates': None}
{'Text': u'Wooo...', 'Created At': datetime.datetime(2017, 3, 12, 2, 55, 9), 'Coordinates': None}
{'Text': u'Man...', 'Created At': datetime.datetime(2017, 3, 12, 2, 55, 9), 'Coordina
tes': None}
{'Text': u'The...', 'Created At': datetime.datetime(201
7, 3, 12, 2, 55, 10), 'Coordinates': None}

Answer Source

From the docs:

Only geolocated Tweets falling within the requested bounding boxes will be included—unlike the Search API, the user’s location field is not used to filter Tweets.

That guarentees that the tweets in the response are from the bounding box provided.

How does the bounding box filter work?

The streaming API uses the following heuristic to determine whether a given Tweet falls within a bounding box:

  • If the coordinates field is populated, the values there will be tested against the bounding box. Note that this field uses geoJSON order (longitude, latitude).

  • If coordinates is empty but place is populated,the region defined in place is checked for intersection against the locations bounding box. Any overlap will match. If none of the rules listed above match, the Tweet does not match the location query.

Again, this implies that the coordinates field can be None but that the bbox filter is guaranteed to return tweets from the bounding box region

source: https://dev.twitter.com/streaming/overview/request-parameters#locations

Edit: place is a field in the response similar to coordinates.