Dinosaurius -4 years ago 125
Python Question

# Two identical code samples give different results

This is a sample data frame

`df`
:

``````TIME  VP_1  VP_2  VP_3   EVAL
20    3242  3244  3245   0
24    3244  3244  3242   0
30    3456  3244  3456   1
33    3456  3245  3242   0
45    3242  3456  3245   1
``````

I am calculating an average
`TIME`
per
`VP_*`
when
`EVAL`
is equal to
`0`
and
`1`
.

This is a sample output for
`VP`
equal to
`3242`
.

``````VP     EVAL   AVG_TIME
3242   0      25.67
3242   1      45
``````

The problem is that I get different results when applying the following two identical codes on my real dataset. I cannot understand why this happens and which approach (of these two) is correct.

Code #1

``````grouped = (pd.melt(df, id_vars=['EVAL', 'TIME'], value_name='VP')
.drop('variable', axis=1).drop_duplicates()
.groupby(['EVAL', 'VP']).agg({'TIME' : 'mean'})
.reset_index())
``````

Code #2

``````cols = ['VP', 'TIME', 'EVAL']
grouped = pd.melt(
df, ['TIME', 'EVAL'],
['VP_1', 'VP_2', 'VP_3'],
value_name='VP')[cols]
ab = grouped.groupby(['EVAL','VP']).agg({'TIME' : 'mean'}).reset_index()
``````

There is difference with `drop_duplicates`:

`drop('variable', axis=1)` is same as `[cols]` - both remove column `variable`

``````.drop_duplicates()
``````

So row `6` and `12` is removed because duplicates:

``````grouped = pd.melt(df, id_vars=['EVAL', 'TIME'], value_name='VP')
.drop('variable', axis=1).drop_duplicates()
print (grouped)
EVAL  TIME    VP
0      0    20  3242
1      0    24  3244
2      1    30  3456
3      0    33  3456
4      1    45  3242
5      0    20  3244
7      1    30  3244
8      0    33  3245
9      1    45  3456
10     0    20  3245
11     0    24  3242
13     0    33  3242
14     1    45  3245
``````

``````cols = ['VP', 'TIME', 'EVAL']
grouped = pd.melt(df, ['TIME', 'EVAL'], ['VP_1', 'VP_2', 'VP_3'], value_name='VP')[cols]
print (grouped)
VP  TIME  EVAL
0   3242    20     0
1   3244    24     0
2   3456    30     1
3   3456    33     0
4   3242    45     1
5   3244    20     0
6   3244    24     0
7   3244    30     1
8   3245    33     0
9   3456    45     1
10  3245    20     0
11  3242    24     0
12  3456    30     1
13  3242    33     0
14  3245    45     1
``````
Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download