Bruhnswik Bruhnswik - 2 months ago 6
Python Question

Applying iterative function to every group in pandas DataFrame

I have large pandas DataFrame with following format:

prod_id timestamp text
150523 0006641040 9.393408e+08 text_1
150500 0006641040 9.408096e+08 text_2
150499 0006641041 1.009325e+09 text_3
150508 0006641041 1.018397e+09 text_4
150524 0006641042 1.025482e+09 text_5


DataFrame is sorted by prod_id and timestamp. What I am trying to do, is to enumerate counter for every prod_id based on a timestamp from earliest to latest. For example, I am trying to achieve something like this:

prod_id timestamp text enum
150523 0006641040 9.393408e+08 text_1 1
150500 0006641040 9.408096e+08 text_2 2
150499 0006641041 1.009325e+09 text_3 1
150508 0006641041 1.018397e+09 text_4 2
150524 0006641042 1.025482e+09 text_5 1


I can do this iteratively quite easily by going through each row and increasing counter, but is there a way to do this in a more functional programming fashion?

Thanks

Answer

UPDATE:

In [324]: df
Out[324]:
        prod_id     timestamp    text
150523  6641040  9.393408e+08  text_1
150500  6641040  9.408096e+08  text_2
150501  6641040  9.408096e+08  text_3
150499  6641041  1.009325e+09  text_3
150508  6641041  1.018397e+09  text_4
150524  6641042  1.025482e+09  text_5

In [325]: df['enum'] = df.groupby(['prod_id'])['timestamp'].cumcount() + 1

In [326]: df
Out[326]:
        prod_id     timestamp    text  enum
150523  6641040  9.393408e+08  text_1     1
150500  6641040  9.408096e+08  text_2     2
150501  6641040  9.408096e+08  text_3     3
150499  6641041  1.009325e+09  text_3     1
150508  6641041  1.018397e+09  text_4     2
150524  6641042  1.025482e+09  text_5     1

OLD answer:

In [314]: df['enum'] = df.groupby(['prod_id'])['timestamp'].rank().astype(int)

In [315]: df
Out[315]:
        prod_id     timestamp    text  enum
150523  6641040  9.393408e+08  text_1     1
150500  6641040  9.408096e+08  text_2     2
150499  6641041  1.009325e+09  text_3     1
150508  6641041  1.018397e+09  text_4     2
150524  6641042  1.025482e+09  text_5     1