seeiespi - 1 year ago 36

Python Question

I'm working on a time-series analysis I have a dataset that contains at least two indexing variables (in this case ID and Year). After sorting by ID and Year:

`ID Year Grade`

1 1990 B

1 1991 B

1 1991 A

2 2001 B

2 2001 C

2 2001 C

2 2003 A

I would like to group by these variables and assign the order of appearance of the second level group within first group to each observation. In other words, I'd like to generate the "period" variable below:

`ID Year Grade period`

1 1990 B 1

1 1991 B 2

1 1991 A 2

2 2001 B 1

2 2001 C 1

2 2001 C 1

2 2003 A 2

Answer

I found a round about way to go about this. First sort your level variables

```
full_data = full_data.sort_values(by=['ID','Year'],ascending = [True, True])
```

create a binary variable that indicates the first observation in each 2 level group

```
full_data['temp'] = (full_data.groupby(['ID','Year']).cumcount() == 0).astype(int)
```

then calculate the cumulative sum of that variable within each first level group.

```
full_data['period'] = full_data.groupby(['ID']).cumsum()['temp']
```

This answer does not work if you do not sort first. Being fairly inexperienced with Pandas it took me a while to figure this out. It works but I'm curious if anyone has a simpler solution.

Source (Stackoverflow)