David - 1 year ago 104
Python Question

# Modular arithmetic in python to iterate a pandas dataframe

Ok, I have a big dataframe such as:

``````      hour    value
0      0      1
1      6      2
2     12      3
3     18      4
4      0      5
5      6      6
6     12      7
7     18      8
8      6      9
9     12     10
10     18     11
11     12     12
12     18     13
13      0     14
``````

Let's don't get lost here. The column
`hour`
represents the hours of the day, from 6 to 6 hours. Column
`values`
is well, exactly that, here the values are as an example, not the actual ones.

If you look closely to the
`hour`
column, you can see that there are hours missing. For instance, there is a gap between rows 7 and 8 (the value of hour 0 is missing). There are also bigger gaps, such as in between rows 10 and 11 (hours 00 and 06).

What do I need? I would like to check when an hour (and of course) a value is missing, and complete the dataframe inserting a row there with the corresponding hour and a
`np.nan`
as value.

What have I thought? I think this would be easily solved using modular arithmetic, in this case with mod 24, such as when
`18 + 6 = 24 = 0 mod 24`
. So initializing the counter to zero and adding 6 with the caveat that the counter is defined in modular arithmetic mod 24 you can verify if each
`hour`
is the corresponding hour, and if not, insert a new row with the corresponding hour and with
`np.nan`
as value.

I don't know how to do the implementation of modular arithmetic in python to iterate a dataframe column.

Thank you very much.

# Solution

``````group_hours = (df.hour <= df.hour.shift()).cumsum()

def insert_missing_hours(df):
return df.set_index('hour').reindex([0, 6, 12, 18]).reset_index()

df.groupby(group_hours).apply(insert_missing_hours).reset_index(drop=1)
``````

Looks like:

``````    hour  value
0      0    1.0
1      6    2.0
2     12    3.0
3     18    4.0
4      0    5.0
5      6    6.0
6     12    7.0
7     18    8.0
8      0    NaN
9      6    9.0
10    12   10.0
11    18   11.0
12     0    NaN
13     6    NaN
14    12   12.0
15    18   13.0
16     0   14.0
17     6    NaN
18    12    NaN
19    18    NaN
``````

# Explanation

In order to apply `reindex` I needed to determine which rows to group. I checked to see if row's hour was less or equal than prior row's hour. If so, that flags a new group.

`insert_missing_hours` is precisely the `reindex` of subgroups with `[0, 6, 12, 18]`.

Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download