tktktk0711 tktktk0711 - 1 year ago 44
Python Question

python2 and pandas: get all occurrence of day during a year in a dataframe

I have a large number of rows dataframe(df_m) as below.

I want to plot the number of occurrence of day for years(2010-2017) of

date_m
column in the dataframe. Since the year range of
date_m
is from 2010-2017, and sometimes the day is more than one time, for example,
2010-07-26
there are two times, so the occurrence of
2010-07-26
is
2
, but sometimes, the day have no data, for example
2010-7-21
is
0
, I want to count the occurrence of days during a whole year(from 2010-2017), when the day is in the dataframe, and count the occurrence, when is not in the dataframe, the occurrence is 0, please check the dataframe form as below. And Finally want to the days occurrence every month for years as shown the following figure.

Could you give me some hints how to do it, thanks!

db num date_a date_m date_c zip_b zip_a
0 old HKK10032 2010-07-14 2010-07-26 NaT NaN NaN
1 old HKK10109 2011-07-14 2011-09-15 NaT NaN NaN
2 old HNN10167 2012-07-15 2012-08-09 NaT 177-003 NaN
3 old HKK10190 2013-07-15 2013-09-02 NaT NaN NaN
4 old HKK10251 2014-07-16 2014-05-02 NaT NaN NaN
5 old HKK10253 2015-07-16 2015-05-01 NaT NaN NaN
6 old HNN10275 2017-07-16 2017-07-18 2010-07-18 1070062 NaN
7 old HKK10282 2017-07-16 2017-08-16 NaT NaN NaN
8 old HKK10032 2010-07-14 2010-07-26 NaT NaN NaN
9 old HKK10109 2011-07-14 2011-09-15 NaT NaN NaN
....


The dataframe form I want to get

2010 2011 2012 2013 2014 2015 2016 2017
1 0 1 0 1 0 2 1 0
2 0 0 0 0 0 0 0 0
3 0 0 1 0 0 2 0 0
4 0 16 14 47 37 37 26 31
5 0 18 11 29 29 24 16 23
6 0 13 8 31 33 24 16 32
7 0 14 31 25 17 21 24 70
8 0 16 25 14 21 27 35 59
9 0 16 10 22 16 31 54 42
10 0 15 11 22 37 48 53 29
11 0 13 10 39 43 41 28 36
12 0 5 7 51 44 30 34 24
13 0 10 13 65 17 30 32 48
14 0 15 26 22 27 31 25 85
15 0 22 22 38 31 30 36 71
16 0 22 19 41 36 33 50 41
17 0 14 19 32 34 43 55 27
18 0 12 17 32 54 37 22 32
19 0 7 14 76 41 32 24 36
20 1 7 22 55 34 30 39 57
21 0 14 39 50 28 26 31 100
22 0 20 32 28 35 30 45 80
23 0 18 19 46 26 31 59 52
24 0 12 20 33 47 59 45 51
25 0 16 22 55 49 48 47 40
26 0 13 17 77 43 40 45 50
27 0 18 33 70 37 39 45 85
28 0 17 34 59 36 46 58 111
29 0 23 27 53 36 39 70 91
30 0 28 37 55 45 52 82 77
.. ... ... ... ... ... ... ... ...
337 19 37 53 40 38 27 55 0
338 29 18 61 25 34 33 69 0
339 18 13 21 27 34 52 54 0
340 17 14 38 33 44 48 48 0
341 14 15 22 48 43 36 38 0
342 13 10 28 49 35 31 36 0
343 12 21 62 33 23 31 43 0
344 20 37 45 29 31 41 45 0
345 15 26 50 19 27 47 76 0
346 24 22 29 27 36 63 71 0
347 16 14 38 43 41 51 33 0
348 13 19 46 52 38 49 34 0
349 22 19 42 46 30 50 49 0
350 17 28 75 54 42 40 42 0
351 22 41 61 34 44 36 44 0
352 23 34 55 42 37 57 79 0
353 30 34 51 45 38 65 81 0
354 25 32 49 57 43 59 61 0
355 22 29 64 50 53 66 62 0
356 24 34 62 54 46 64 66 0
357 19 39 76 47 49 74 79 0
358 29 33 63 44 46 51 83 0
359 28 24 53 38 55 49 82 0
360 26 35 58 46 42 66 63 0
361 27 35 73 32 38 72 70 0
362 21 27 64 42 41 60 66 0
363 23 28 52 47 44 39 61 0
364 16 16 40 27 17 32 67 0
365 0 1 31 0 14 13 38 0
366 0 0 1 0 0 0 17 0

[366 rows x 8 columns]


figure

Answer Source

You can use:

df1 = df.groupby('date_b')['date_b'].count()
        .reindex(pd.date_range('2017-01-01', '2017-12-31', freq='D'), fill_value=0)

EDIT:

Need groupby by year and day and aggregate count. Then create new MultiIndex.from_product and reindex original, last reshape by unstack:

#if necessary convert to datetime
df['date_m'] = pd.to_datetime(df['date_m'])
df1 = df.groupby([df['date_m'].dt.year, df['date_m'].dt.day])['date_m'].count()
mux = pd.MultiIndex.from_product([range(2010, 2018), range(1, 32)])
df1 = df1.reindex(mux, fill_value=0)
df1 = df1.unstack(0)

print (df1)
    2010  2011  2012  2013  2014  2015  2016  2017
1      0     0     0     0     0     1     0     0
2      0     0     0     1     1     0     0     0
3      0     0     0     0     0     0     0     0
4      0     0     0     0     0     0     0     0
5      0     0     0     0     0     0     0     0
6      0     0     0     0     0     0     0     0
7      0     0     0     0     0     0     0     0
8      0     0     0     0     0     0     0     0
9      0     0     1     0     0     0     0     0
10     0     0     0     0     0     0     0     0
11     0     0     0     0     0     0     0     0
12     0     0     0     0     0     0     0     0
13     0     0     0     0     0     0     0     0
14     0     0     0     0     0     0     0     0
15     0     2     0     0     0     0     0     0
16     0     0     0     0     0     0     0     1
17     0     0     0     0     0     0     0     0
18     0     0     0     0     0     0     0     1
19     0     0     0     0     0     0     0     0
20     0     0     0     0     0     0     0     0
21     0     0     0     0     0     0     0     0
22     0     0     0     0     0     0     0     0
23     0     0     0     0     0     0     0     0
24     0     0     0     0     0     0     0     0
25     0     0     0     0     0     0     0     0
26     2     0     0     0     0     0     0     0
27     0     0     0     0     0     0     0     0
28     0     0     0     0     0     0     0     0
29     0     0     0     0     0     0     0     0
30     0     0     0     0     0     0     0     0
31     0     0     0     0     0     0     0     0

df1.plot()

EDIT: Need dayofyear:

df['date_m'] = pd.to_datetime(df['date_m'])
df1 = df.groupby([df['date_m'].dt.year, df['date_m'].dt.dayofyear])['date_m'].count()
mux = pd.MultiIndex.from_product([range(2010, 2018), range(1, 367)])
df1 = df1.reindex(mux, fill_value=0)
df1 = df1.unstack(0)
print (df1)
     2010  2011  2012  2013  2014  2015  2016  2017
1       0     0     0     0     0     0     0     0
2       0     0     0     0     0     0     0     0
3       0     0     0     0     0     0     0     0
4       0     0     0     0     0     0     0     0
5       0     0     0     0     0     0     0     0
6       0     0     0     0     0     0     0     0
7       0     0     0     0     0     0     0     0
8       0     0     0     0     0     0     0     0
9       0     0     0     0     0     0     0     0
10      0     0     0     0     0     0     0     0
11      0     0     0     0     0     0     0     0
12      0     0     0     0     0     0     0     0
...
...

df1.plot()
Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download