NewToPython43532 NewToPython43532 - 16 days ago 4
Python Question

Adding scores of sentences depending on region Python very lost

I have a txt file containing over 200 tweets and im trying to calculate total scores for all the tweets in a particular region given their long/ lat.. a typical tweet looks like:

[30.346168930000001, -97.73518] 0 2011-08-29 04:54:22 Best vacation of my life #byfar


Ive done this before, but only calculating for the line of the sentence, so I didSide note, I have a file containing words, and another containing sentences and I had to see if any of the words are in the sentences and add the amount of words and their sentiment value which was a value associated with the word.. looked like

Happy: 1
Sad, 5:

with open('words.txt') as f:
sentiments = {word: int(value)
for word, value in
(line.split(",") for line in f)}

with open('sentences.txt') as f:
for line in f:
values = Counter(word for word in line.split() if word in sentiments)
if not values:
continue


But the whole long, lat business, I dont know how to add all the scores in a particular region. Mostly because im confused about the longitiude and latititude.

So, first I tried to "approximate" regions corresponding to their timezones (not real data). So Eastern (P1. P2. P3, P4, Pacific(P7, P8, P9, P10), Mountain (P5, P6, P7, P8) , Central(P3,P4,P5,P6)..

So with this info:

p1 = (49.189787, -67.444574)
p2 = (24.660845, -67.444574)
p3 = (49.189787, -87.518395)
p4 = (24.660845, -87.518395)
p5 = (49.189787, -101.998892)
p6 = (24.660845, -101.998892)
p7 = (49.189787, -115.236428)
p8 = (24.660845, -115.236428)
p9 = (49.189787, -125.242264)
p10 = (24.660845, -125.242264)


I determined the regions as

class Region:
def __init__(self, lat_tuple, long_tuple):
self.lat_tuple = lat_tuple
self.long_tuple = long_tuple

def contains(self, lat, long):
return self.lat_tuple[0] <= lat and lat < self.lat_tuple[1] and\
self.long_tuple[0] <= long and long < self.long_tuple[1]

eastern = Region((24.660845, 49.189787), (-87.518395, -67.444574))
central = Region((24.660845, 49.189787), (-101.998892, -87.518395))
mountain = Region((24.660845, 49.189787), (-115.236428, -101.998892))
pacific = Region((24.660845, 49.189787), (-125.242264, -115.236428))


I think I have gotten most of it done but I just dont know how to say if the tweets are in the. I need help adding up all the scores of sentences in a particular region. Or just an outline

Answer

I didn't check you're coordinates completely, but you seem to be on the right track. Using what you did, all I would need to do to parse the tweet file:

scores = {'eastern':0,'central':0,'pacific':0,'mountain':0}
for line in open('tweets.txt'):
    line = line.split(" ")
    lat  = float(line[0][1:-1]) #Stripping the [ and the ,
    long = float(line[1][:-1])  #Stripping the ]
    if eastern.contains(lat,long):
         scores['eastern'] += score(line) #Assuming you have a score function
    elif central.contains(lat,long):
         scores['central'] += score(line)         
    elif mountain.contains(lat,long):
         scores['mountain'] += score(line)
    elif pacific.contains(lat,long):
         scores['pacific'] += score(line)
    else: raise ValueError("Could not locate coordinates "+line[0]+line[1])

You could make this more elegant by wrapping the if statements in a function:

def region(lat,long):
    #DEFINE HERE YOUR REGIONS, IN THE Function, or leave them as globals
    if eastern.contains(lat,long):  return 'eastern'
    if central.contains(lat,long):  return 'central'         
    if mountain.contains(lat,long): return 'mountain'
    if pacific.contains(lat,long):  return 'pacific'
    raise ValueError(" ".join(("could not locate coordinates",str(lat),str(long))))

Than the if statements in the loop are gone:

scores[region(lat,long)] += score(line)

EDIT: you need to define score to be a function that accepts a tweet, or the split line in my above code(which is a list of words, including the coordinate):

def score(tweet):
    total = 0
    for word in tweet:
        if word in sentiments: total += 1
    return total/(len(tweet)-2) #Subtract the coordinates from the length)

Assuming the global sentiments is defined beforehand.

Comments