David - 4 months ago 28

Python Question

Ok, I have a big dataframe such as:

`hour value`

0 0 1

1 6 2

2 12 3

3 18 4

4 0 5

5 6 6

6 12 7

7 18 8

8 6 9

9 12 10

10 18 11

11 12 12

12 18 13

13 0 14

Let's don't get lost here. The column

`hour`

`values`

If you look closely to the

`hour`

`np.nan`

`18 + 6 = 24 = 0 mod 24`

`hour`

`np.nan`

I don't know how to do the implementation of modular arithmetic in python to iterate a dataframe column.

Thank you very much.

Answer

```
group_hours = (df.hour <= df.hour.shift()).cumsum()
def insert_missing_hours(df):
return df.set_index('hour').reindex([0, 6, 12, 18]).reset_index()
df.groupby(group_hours).apply(insert_missing_hours).reset_index(drop=1)
```

Looks like:

```
hour value
0 0 1.0
1 6 2.0
2 12 3.0
3 18 4.0
4 0 5.0
5 6 6.0
6 12 7.0
7 18 8.0
8 0 NaN
9 6 9.0
10 12 10.0
11 18 11.0
12 0 NaN
13 6 NaN
14 12 12.0
15 18 13.0
16 0 14.0
17 6 NaN
18 12 NaN
19 18 NaN
```

In order to apply `reindex`

I needed to determine which rows to group. I checked to see if row's hour was less or equal than prior row's hour. If so, that flags a new group.

`insert_missing_hours`

is precisely the `reindex`

of subgroups with `[0, 6, 12, 18]`

.