renjl0810 renjl0810 - 24 days ago 18
Python Question

Calcualation within Pandas dataframe group

I've Pandas Dataframe as shown below. What I'm trying to do is,

partition (or groupby) by BlockID, LineID, WordID
, and then within each group use
current WordStartX - previous (WordStartX + WordWidth)
to derive another column, e.g., WordDistance to indicate the distance between this word and previous word.

This post Row operations within a group of a pandas dataframe is very helpful but in my case multiple columns involved (WordStartX and WordWidth).
Any comments or suggestions would be much appreciated!

*BlockID LineID WordID WordStartX WordWidth WordDistance
0 0 0 0 275 150 0
1 0 0 1 431 96 431-(275+150)=6
2 0 0 2 642 90 642-(431+96)=115
3 0 0 3 746 104 746-(642+90)=14
4 1 0 0 273 69 ...
5 1 0 1 352 151 ...
6 1 0 2 510 92
7 1 0 3 647 90
8 1 0 4 752 105**

Answer

The diff() and shift() functions are usually helpful for calculation referring to previous or next rows:

df['WordDistance'] = (df.groupby(['BlockID', 'LineID'])
        .apply(lambda g: g['WordStartX'].diff() - g['WordWidth'].shift()).fillna(0).values)

enter image description here

Comments