I have a python server that accepts time series data. Now I need to calculate the average traffic for the last minute, output like 90 samples/minute. I'm currently using a python list to hold all time stamps and use a pretty awful way(in my opinion) to calculate this. The code roughly looks like this:
timestamps = 
# this gets one record of traffic
data = self.accept_data()
# get record's timestamp
timestamp = data.timestamp
# add to list
# get the time one minute ago
minute_ago = timestamp - datetime.timedelta(minutes=1)
# find out the first index of the timestamp in the past that's within 1 minute
for i, t in enumerate(self.timestamp):
if t > minute_ago:
# see how many records are within last minute
result = len(self.timestamp[i:])
# throw away the earlier data
self.timestamp = self.timestamp[i:]
Use Queue to hold
<traffic, timestamp> pair. Here
timestamp is the time it has been pushed on Queue(arrives from server). Track the
sum of the traffics of Queue. When a new traffic arrives and the difference between its timestamp and Queue's front element's timestamp more than 1 minute, pop front from Queue. And subtract the poped traffic value from sum. Push the new traffic into queue and add to sum.
This way, your queue is working as a window frame to hold the 1 minute traffic all the time. And you are tracking the sum and you know the Queue size, so you can calculate the average.
The space complexity is
O(maximum traffic can be arrived within 1 minute). Time complexity is
O(1) for getting average at any time.
This is a very conventional algorithm for Query on any running stream of data in constant time complexity.
Note: Unfortunately I don't know Python. Otherwise I would put the implementation.