Dinosaurius Dinosaurius -4 years ago 125
Python Question

Two identical code samples give different results

This is a sample data frame

df
:

TIME VP_1 VP_2 VP_3 EVAL
20 3242 3244 3245 0
24 3244 3244 3242 0
30 3456 3244 3456 1
33 3456 3245 3242 0
45 3242 3456 3245 1


I am calculating an average
TIME
per
VP_*
when
EVAL
is equal to
0
and
1
.

This is a sample output for
VP
equal to
3242
.

VP EVAL AVG_TIME
3242 0 25.67
3242 1 45


The problem is that I get different results when applying the following two identical codes on my real dataset. I cannot understand why this happens and which approach (of these two) is correct.

Code #1

grouped = (pd.melt(df, id_vars=['EVAL', 'TIME'], value_name='VP')
.drop('variable', axis=1).drop_duplicates()
.groupby(['EVAL', 'VP']).agg({'TIME' : 'mean'})
.reset_index())


Code #2

cols = ['VP', 'TIME', 'EVAL']
grouped = pd.melt(
df, ['TIME', 'EVAL'],
['VP_1', 'VP_2', 'VP_3'],
value_name='VP')[cols]
ab = grouped.groupby(['EVAL','VP']).agg({'TIME' : 'mean'}).reset_index()

Answer Source

There is difference with drop_duplicates:

drop('variable', axis=1) is same as [cols] - both remove column variable

.drop_duplicates()

So row 6 and 12 is removed because duplicates:

grouped = pd.melt(df, id_vars=['EVAL', 'TIME'], value_name='VP')
            .drop('variable', axis=1).drop_duplicates()
print (grouped)
    EVAL  TIME    VP
0      0    20  3242
1      0    24  3244
2      1    30  3456
3      0    33  3456
4      1    45  3242
5      0    20  3244
7      1    30  3244
8      0    33  3245
9      1    45  3456
10     0    20  3245
11     0    24  3242
13     0    33  3242
14     1    45  3245

cols = ['VP', 'TIME', 'EVAL']
grouped = pd.melt(df, ['TIME', 'EVAL'], ['VP_1', 'VP_2', 'VP_3'], value_name='VP')[cols]
print (grouped)
      VP  TIME  EVAL
0   3242    20     0
1   3244    24     0
2   3456    30     1
3   3456    33     0
4   3242    45     1
5   3244    20     0
6   3244    24     0
7   3244    30     1
8   3245    33     0
9   3456    45     1
10  3245    20     0
11  3242    24     0
12  3456    30     1
13  3242    33     0
14  3245    45     1
Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download