user07 - 6 months ago 42

Python Question

I need to convert below SAS code to python pandas. I am not getting exact result with what i have tried.

Below SAS code to convert into pandas:

`proc sql;`

create table t1 as

select

c1, c2, c3, c4, c5, flag, max(flag) as MAX_flag

from t1

group by c1, c2, c3, c5;

run;

I tried like this: Its working for this example but with large dataset i am getting mismatch.

`import pandas as pd`

import numpy as np

df = pd.DataFrame({'A':['z','y','x','x','z','y','z','y','x','z',],

'B':[0,1,2,0,1,2,0,1,2,0],

'ID':[0,1,0,1,0,1,0,2,3,4]})

df=df.sort_values(['A','B','ID'], ascending=[True,True,False])

df.loc[:,'Max']=df.groupby(['A','B'])['ID'].cummax()

I want to convert above SAS to pandas. Please let me know if any one has done it.

Answer

Use transform to get back the result in row-wise form.

```
In [27]: df
Out[27]:
A B ID
0 z 0 0
1 y 1 1
2 x 2 0
3 x 0 1
4 z 1 0
5 y 2 1
6 z 0 0
7 y 1 2
8 x 2 3
9 z 0 4
In [28]: df['max'] = df.groupby(['A', 'B'])['ID'].transform(np.max)
In [29]: df
Out[29]:
A B ID max
0 z 0 0 4
1 y 1 1 2
2 x 2 0 3
3 x 0 1 1
4 z 1 0 0
5 y 2 1 1
6 z 0 0 4
7 y 1 2 2
8 x 2 3 3
9 z 0 4 4
```

Source (Stackoverflow)