marupav marupav - 4 months ago 6
Python Question

Create new variable based on groupby in Python

I have a dataframe as below:

I/P:

id day start location value match
4413869 16080 360 5000 2 1
4413869 16080 360 5000 3 1
4413869 16080 360 5000 5 1
4413869 16080 360 5000 16 1
4413869 16080 360 5015 1 1
4413869 16080 361 -1 1 0
4413869 16080 361 -1 2 0
4413869 16080 361 -1 3 0
4413869 16080 361 -1 5 0
4413869 16080 361 -1 16 0
4413869 16080 362 -1 1 0
4413869 16080 362 -1 2 0
4413869 16080 362 -1 3 0
4413869 16080 362 -1 5 0
4413869 16080 362 -1 16 0
4413869 16080 363 -1 1 0
4413869 16080 363 -1 2 0
4413869 16080 363 -1 3 0
4413869 16080 363 -1 5 0
4413869 16080 363 -1 16 0
4413869 16080 364 -1 1 0
4413869 16080 364 -1 2 0
4413869 16080 364 -1 3 0
4413869 16080 364 -1 5 0
4413869 16080 364 -1 16 0


I need to do the following to get my O/P:


  1. loop through combination (group) of id+day+start+location

  2. if the location is in the top/first of the group then new_var = 0

  3. if match is 1 in that group (starting from first) then new_var = new_var+1

  4. this increment should continue until last location of that group.

  5. write the last record of the group to the output.



O/P:

id day start loc value match new_var
4413869 16080 360 5000 16 1 4
4413869 16080 360 5015 1 1 1
4413869 16080 361 -1 16 0 0
4413869 16080 362 -1 16 0 0
4413869 16080 363 -1 16 0 0
4413869 16080 364 -1 16 0 0


I know that I can use group by function,but not able think of how to iterate and increment in technical way.

Can anyone guide me?

Thanks.

Answer

Use cumsum over match in your groupby

df['new_var'] = df.groupby(['id', 'day', 'start', 'location']).match.cumsum()
df.head()

enter image description here

then use last in another groupby

df.groupby(['id', 'day', 'start', 'location'], as_index=False).last()

enter image description here

Comments