I am using python 3.4 , tweepy API to extract tweets to a text file but instead of only tweets the entire source of the page is being extracted. If there is a way to get the tweets only not entire source code .
The tweets come in formatted as JSON. So include simple JSON into your script to encode them. (btw this is python 2.7 so you will have to do the print differently)
from tweepy.utils import import_simplejson json = import_simplejson()
Load each tweet with python, make sure that the data['entities]['hashtags] field is not empty (so it has a hashtag)
data_temp = json.loads(data) if data_temp['entities']['hashtags']: tweet_text = data_temp["text"].encode('utf-8')
then print it out to file
These are only snippets, use this guys script to help you get going and source the tweets. Big ups to him: https://github.com/bwbaugh/twitter-corpus