Miguel - 6 months ago 14

Python Question

I'm having issues building a function that can take the values from a column in a dataframe and keep a counter for each distinct event.

I want to generate the column Cnt from the column Col.

`Col Cnt`

A 1

B 1

A 2

C 1

B 2

A 3

C 2

My main issue is keeping the value of each counter (the counter for A, for B, for C,...) because the cardinality is quite high (thousands of distinct values and some a almost 3 million records).

I was considering, as a last resource the following:

1 - apply .unique() to the column and store that series in a variable var;

2 - Go through all rows of the dataframe applying the following: IF df.col = var.VALUE1,

cnt=cnt+1;

3 - Go through all values of the series.

I was hoping if there is a function/way that can allow me to perform this operation a bit more efficiently.

Thanks!

Answer

If I understood correctly,

```
df["Cnt"] = df.groupby("Col").cumcount() + 1
df
Out[29]:
Col Cnt
0 A 1
1 B 1
2 A 2
3 C 1
4 B 2
5 A 3
6 C 2
```