David David - 28 days ago 10
Python Question

Modular arithmetic in python to iterate a pandas dataframe

Ok, I have a big dataframe such as:

hour value
0 0 1
1 6 2
2 12 3
3 18 4
4 0 5
5 6 6
6 12 7
7 18 8
8 6 9
9 12 10
10 18 11
11 12 12
12 18 13
13 0 14


Let's don't get lost here. The column
hour
represents the hours of the day, from 6 to 6 hours. Column
values
is well, exactly that, here the values are as an example, not the actual ones.

If you look closely to the
hour
column, you can see that there are hours missing. For instance, there is a gap between rows 7 and 8 (the value of hour 0 is missing). There are also bigger gaps, such as in between rows 10 and 11 (hours 00 and 06).

What do I need? I would like to check when an hour (and of course) a value is missing, and complete the dataframe inserting a row there with the corresponding hour and a
np.nan
as value.

What have I thought? I think this would be easily solved using modular arithmetic, in this case with mod 24, such as when
18 + 6 = 24 = 0 mod 24
. So initializing the counter to zero and adding 6 with the caveat that the counter is defined in modular arithmetic mod 24 you can verify if each
hour
is the corresponding hour, and if not, insert a new row with the corresponding hour and with
np.nan
as value.

I don't know how to do the implementation of modular arithmetic in python to iterate a dataframe column.

Thank you very much.

Answer

Solution

group_hours = (df.hour <= df.hour.shift()).cumsum()

def insert_missing_hours(df):
    return df.set_index('hour').reindex([0, 6, 12, 18]).reset_index()

df.groupby(group_hours).apply(insert_missing_hours).reset_index(drop=1)

Looks like:

    hour  value
0      0    1.0
1      6    2.0
2     12    3.0
3     18    4.0
4      0    5.0
5      6    6.0
6     12    7.0
7     18    8.0
8      0    NaN
9      6    9.0
10    12   10.0
11    18   11.0
12     0    NaN
13     6    NaN
14    12   12.0
15    18   13.0
16     0   14.0
17     6    NaN
18    12    NaN
19    18    NaN

Explanation

In order to apply reindex I needed to determine which rows to group. I checked to see if row's hour was less or equal than prior row's hour. If so, that flags a new group.

insert_missing_hours is precisely the reindex of subgroups with [0, 6, 12, 18].

Comments