user6804939 user6804939 - 3 months ago 10
SQL Question

SQL / Pandas equivalent

What would be the Pandas equivalent for this SQL query :

select column1,
sum(column2) as A,
count(distinct column3) as B,
sum(column2) / count(distinct column3) as C
from table1
group by column1


Thanks for any help on that!!

Answer

I'm not sure that the sum(column2) / count(distinct column3) as C part can be done in the same single step, but you can easily do it in two steps:

Demo:

In [47]: df = pd.DataFrame(np.random.randint(0,5,size=(15, 3)), columns=['c1','c2','c3'])
In [48]: df
Out[48]:
    c1  c2  c3
0    4   0   3
1    2   3   2
2    1   2   3
3    3   3   0
4    1   0   4
5    1   1   1
6    2   3   3
7    2   2   2
8    4   0   0
9    1   1   0
10   1   3   0
11   4   3   1
12   0   0   3
13   3   1   0
14   4   3   1

In [49]: x = df.groupby('c1').agg({'c2':'sum', 'c3': 'nunique'}).reset_index().rename(columns={'c2':'A', 'c3':'B'})

In [50]: x
Out[50]:
   c1  A  B
0   0  0  1
1   1  7  4
2   2  8  2
3   3  4  1
4   4  6  3

In [51]: x['C'] = x.A / x.B

In [52]: x
Out[52]:
   c1  A  B     C
0   0  0  1  0.00
1   1  7  4  1.75
2   2  8  2  4.00
3   3  4  1  4.00
4   4  6  3  2.00