user07 user07 - 4 days ago 5
Python Question

Add a new column in Pandas as Max of existing column

I need to convert below SAS code to python pandas. I am not getting exact result with what i have tried.

Below SAS code to convert into pandas:

proc sql;
create table t1 as
select
c1, c2, c3, c4, c5, flag, max(flag) as MAX_flag
from t1
group by c1, c2, c3, c5;
run;


I tried like this: Its working for this example but with large dataset i am getting mismatch.

import pandas as pd
import numpy as np
df = pd.DataFrame({'A':['z','y','x','x','z','y','z','y','x','z',],
'B':[0,1,2,0,1,2,0,1,2,0],
'ID':[0,1,0,1,0,1,0,2,3,4]})
df=df.sort_values(['A','B','ID'], ascending=[True,True,False])
df.loc[:,'Max']=df.groupby(['A','B'])['ID'].cummax()


I want to convert above SAS to pandas. Please let me know if any one has done it.

Answer

Use transform to get back the result in row-wise form.

In [27]: df
Out[27]:
   A  B  ID
0  z  0   0
1  y  1   1
2  x  2   0
3  x  0   1
4  z  1   0
5  y  2   1
6  z  0   0
7  y  1   2
8  x  2   3
9  z  0   4

In [28]: df['max'] = df.groupby(['A', 'B'])['ID'].transform(np.max)

In [29]: df
Out[29]:
   A  B  ID  max
0  z  0   0    4
1  y  1   1    2
2  x  2   0    3
3  x  0   1    1
4  z  1   0    0
5  y  2   1    1
6  z  0   0    4
7  y  1   2    2
8  x  2   3    3
9  z  0   4    4
Comments