Satya - 1 year ago 94
Python Question

# groupby+Arithmatic operation on pandas dataframe in python

Might be a Stupid question but i am confused in between and a logic of how to apply groupby efficiently in this case will be appreciated.

I have a dataframe like

`````` id    NAME    TYPE   SCORE  title
123    DDLJ     cat1   1-6     5
123    DDLJ     cat1   9-10    25
123    DDLJ     cat1     N     5
456    Satya    cat2   9-10    1
456    Satya    cat2    N      3
222    India    cat2   1-6     1
``````

I need to find out for a groupof (id NAME TYPE) a column named 'cat_score' should be formed with logic "for that group [title for SCORE(9-10) - title for SCORE(1-6)] / [sum of title of that group] "

``````#ex for group 123, DDLJ cat1
cat score = (title at SCORE "9-10" - title at SCORE "1-6") / (Sum of title of that group)
= (25 - 5) / (35)
= 0.58
``````

Note::There are 3 types of SCORE ["9-10", "1-6", "N"]. So if for any group any of the score category not found that should be treated as 0 or can be ignored.

My final dataframe should look like

`````` id    NAME    TYPE   SCORE  title  Cat_Score
123    DDLJ     cat1   1-6     5     0.58
123    DDLJ     cat1   9-10    25    0.58
123    DDLJ     cat1     N     5     0.58
456    Satya    cat2   9-10    1     0.34
456    Satya    cat2    N      3     0.34
222    India    cat2   1-6     1      -1
``````

# I have tried by taking one group

``````s = round((int(df[(df['id']=='123') & (df['NAME'] == 'DDLJ') & (df['TYPE']=='cat1') & (df['SCORE']=='9-10')]['title'].values[0]) - int(df[(df['id']=='123') & (df['NAME'] == 'DDLJ') & (df['TYPE']=='cat1') & (df['SCORE']=='1-6')]['title'].values[0])) / (int(df['title'].sum())),2)
s = 0.58
``````

But for all groups, i am confused how to replicate.

I think it would be easier if you first reshape your DataFrame:

``````df2 = df.set_index(['id', 'NAME', 'TYPE']).pivot(columns='SCORE').fillna(0)
df2.columns = df.columns.droplevel(0)
df2
Out:
SCORE           1-6  9-10    N
id  NAME  TYPE
123 DDLJ  cat1  5.0  25.0  5.0
222 India cat2  1.0   0.0  0.0
456 Satya cat2  0.0   1.0  3.0
``````

Now you can do those operations more easily:

``````(df['9-10'] - df['1-6']) / df.sum(axis=1)
Out:
id   NAME   TYPE
123  DDLJ   cat1    0.571429
222  India  cat2   -1.000000
456  Satya  cat2    0.250000
``````

In order to use these in merge, I will reset the index:

``````res = ((df['9-10'] - df['1-6']) / df.sum(axis=1)).reset_index()
res
Out:
id   NAME  TYPE         0
0  123   DDLJ  cat1  0.571429
1  222  India  cat2 -1.000000
2  456  Satya  cat2  0.250000
``````

And finally merge with the original DataFrame:

``````df.merge(res)
Out:
id   NAME  TYPE SCORE  title         0
0  123   DDLJ  cat1   1-6      5  0.571429
1  123   DDLJ  cat1  9-10     25  0.571429
2  123   DDLJ  cat1     N      5  0.571429
3  456  Satya  cat2  9-10      1  0.250000
4  456  Satya  cat2     N      3  0.250000
5  222  India  cat2   1-6      1 -1.000000
``````
Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download