Miguel Miguel - 1 year ago 81
Python Question

counter for many distinct values in data frame column

I'm having issues building a function that can take the values from a column in a dataframe and keep a counter for each distinct event.

I want to generate the column Cnt from the column Col.

Col Cnt
A 1
B 1
A 2
C 1
B 2
A 3
C 2

My main issue is keeping the value of each counter (the counter for A, for B, for C,...) because the cardinality is quite high (thousands of distinct values and some a almost 3 million records).
I was considering, as a last resource the following:

1 - apply .unique() to the column and store that series in a variable var;

2 - Go through all rows of the dataframe applying the following: IF df.col = var.VALUE1,

3 - Go through all values of the series.

I was hoping if there is a function/way that can allow me to perform this operation a bit more efficiently.


Answer Source

If I understood correctly,

df["Cnt"] = df.groupby("Col").cumcount() + 1

  Col  Cnt
0   A    1
1   B    1
2   A    2
3   C    1
4   B    2
5   A    3
6   C    2