vera vera - 3 months ago 9
Python Question

Can I create a new column based on when the value changes in another column?

Let s say I have this

df


print(df)
DATE_TIME A B
0 10/08/2016 12:04:56 1 5
1 10/08/2016 12:04:58 1 6
2 10/08/2016 12:04:59 2 3
3 10/08/2016 12:05:00 2 2
4 10/08/2016 12:05:01 3 4
5 10/08/2016 12:05:02 3 6
6 10/08/2016 12:05:03 1 3
7 10/08/2016 12:05:04 1 2
8 10/08/2016 12:05:05 2 4
9 10/08/2016 12:05:06 2 6
10 10/08/2016 12:05:07 3 4
11 10/08/2016 12:05:08 3 2


The values in column
['A']
repeat over time, I need a column though, where they have a new ID each time they change, so that I would have something like the following
df


print(df)
DATE_TIME A B C
0 10/08/2016 12:04:56 1 5 1
1 10/08/2016 12:04:58 1 6 1
2 10/08/2016 12:04:59 2 3 2
3 10/08/2016 12:05:00 2 2 2
4 10/08/2016 12:05:01 3 4 3
5 10/08/2016 12:05:02 3 6 3
6 10/08/2016 12:05:03 1 3 4
7 10/08/2016 12:05:04 1 2 4
8 10/08/2016 12:05:05 2 4 5
9 10/08/2016 12:05:06 2 6 5
10 10/08/2016 12:05:07 3 4 6
11 10/08/2016 12:05:08 3 2 6


Is there a way to do this with python? I am still very new to this and hoped to find something that could help me in pandas, but I have not found anything yet. In my original dataframe the values in Column
['A']
change on irregular intervals approximately every ten minutes and not every two rows like in my example. Has anybody an idea how I could approach this task? Thank you

Answer

You can use the shift-cumsum pattern.

df['C'] = (df.A != df.A.shift()).cumsum()

>>> df
              DATE_TIME  A  B  C
0  10/08/2016  12:04:56  1  5  1
1  10/08/2016  12:04:58  1  6  1
2  10/08/2016  12:04:59  2  3  2
3  10/08/2016  12:05:00  2  2  2
4  10/08/2016  12:05:01  3  4  3
5  10/08/2016  12:05:02  3  6  3
6  10/08/2016  12:05:03  1  3  4
7  10/08/2016  12:05:04  1  2  4
8  10/08/2016  12:05:05  2  4  5
9  10/08/2016  12:05:06  2  6  5
10 10/08/2016  12:05:07  3  4  6
11 10/08/2016  12:05:08  3  2  6

As a side note, this is a popular pattern for grouping. For example, to get the average B value of each such group:

df.groupby((df.A != df.A.shift()).cumsum()).B.mean()
Comments