ventolin ventolin - 6 months ago 32
Python Question

Plotting a cumulative graph of python datetimes

Say I have a list of datetimes, and we know each datetime to be the recorded time of an event happening.

Is it possible in matplotlib to graph the frequency of this event occuring over time, showing this data in a cumulative graph (so that each point is greater or equal to all of the points that went before it), without preprocessing this list? (e.g. passing datetime objects directly to some wonderful matplotlib function)

Or do I need to turn this list of datetimes into a list of dictionary items, such as:

{"year": 1998, "month": 12, "date": 15, "events": 92}

and then generate a graph from this list?

Sorry if this seems like a silly question - I'm not all too familiar with matplotlib, and would like to save myself the effort of doing this the latter way if matplotlib can already deal with datetime objects itself.

Answer

This should work for you:

counts = arange(0, len(list_of_dates))
plot(list_of_dates, counts)

You can of course give any of the usual options to the plot call to make the graph look the way you want it. (I'll point out that matplotlib is very adept at handling dates and times.)

Another option would be the hist function - it has an option 'cumulative=True' that might be useful. You can create a cumulative histogram showing the number of events that have occurred as of any given date something like this:

from pyplot import hist
from matplotlib.dates import date2num
hist(date2num(list_of_dates), cumulative=True)

But this produces a bar chart, which might not be quite what you're looking for, and in any case making the date labels on the horizontal axis display properly will probably require some fudging.

EDIT: I'm getting the sense that what you really want is one point (or bar) per date, with the corresponding y-value being the number of events that have occurred up to (and including?) that date. In that case, I'd suggest doing something like this:

grouped_dates = [[d, len(list(g))] for d,g in itertools.groupby(list_of_dates, lambda k: k.date())]
dates, counts = grouped_dates.transpose()
counts = counts.cumsum()
step(dates, counts)

The groupby function from the itertools module will produce the kind of data you're looking for: only a single instance of each date, accompanied by a list (an iterator, actually) of all the datetime objects that have that date. As suggested by Jouni in the comments, the step function will give a graph that steps up at each day on which events occurred, so I'd suggest using that in place of plot.

(Hat tip to EOL for reminding me about cumsum)

If you want to have one point for every day, regardless of whether any events occurred on that day or not, you'll need to alter the above code a bit:

from matplotlib.dates import drange, num2date
date_dict = dict((d, len(list(g))) for d,g in itertools.groupby(list_of_dates, lambda k: k.date()))
dates = num2date(drange(min(list_of_dates).date(), max(list_of_dates).date() + timedelta(1), timedelta(1)))
counts = asarray([date_dict.get(d.date(), 0) for d in dates]).cumsum()
step(dates, counts)

I don't think it'll really make a difference for the plot produced by the step function though.

Comments