John Titus Jungao John Titus Jungao - 1 month ago 12
Python Question

Using pandas apply

I have a dataframe that is similar to the example below:

sample = {'col1': [50.6, 30.67, 40.5, 0, 0, 0],
'col2': [40.74, 30.33, 41.00, 0, 0, 0]}
df_sample = pd.DataFrame(sample)


Now, in both
col2
and
col3
however, the entries represent two different values. For example, for entry
50.6
, that represents
val1 = 5
and
val2 = 0.6
. Another example would be
41.00
. This value represents
4
and
1.0
.

Basically, what I want to get is a column which can be computed as follows:

df_sample['res'] = df_sample.apply(lambda x:
((x['col2']//10)*(x['col2']%10) + (x['col3']//10)*(x['col3']%10))
/ (x['col2']//10 + x['col3']//10), axis=1)
df_sample.fillna(0)


Basically, it gets the weighted average from the values obtained from each column. Now, what I want to do is scale this method to work with let's say twenty columns without hardcoding each column name in the DataFrame. Please advise.

Answer

You can omit apply and rather use Series (columns of Dataframes):

sample = {'col2': [50.6, 30.67, 40.5, 0, 0, 0],
          'col3': [40.74, 30.33, 41.00, 0, 0, 0],
          'col4': [70.6, 80.67, 70.5, 0, 0, 0],
          'col5': [10.74, 50.33, 51.00, 0, 0, 0]}
df_sample = pd.DataFrame(sample)
print (df_sample)
    col2   col3   col4   col5
0  50.60  40.74  70.60  10.74
1  30.67  30.33  80.67  50.33
2  40.50  41.00  70.50  51.00
3   0.00   0.00   0.00   0.00
4   0.00   0.00   0.00   0.00
5   0.00   0.00   0.00   0.00

I think you need:

print ((((df_sample['col2']//10 * df_sample['col2']%10) + 
        (df_sample['col3']//10 * df_sample['col3']%10) +
        (df_sample['col4']//10 * df_sample['col4']%10) +
        (df_sample['col5']//10 * df_sample['col5']%10)) 
         / (df_sample['col2']//10 + df_sample['col3']//10 + 
            df_sample['col4']//10 + df_sample['col5']//10)).fillna(0))

0    0.641176
1    0.526842
2    0.725000
3    0.000000
4    0.000000
5    0.000000
dtype: float64

print (((df_sample//10 * df_sample%10).sum(axis=1).div((df_sample//10).sum(axis=1)))
         .fillna(0))
0    0.641176
1    0.526842
2    0.725000
3    0.000000
4    0.000000
5    0.000000
dtype: float64

Timings:

In [114]: %timeit ((((df_sample['col2']//10 * df_sample['col2']%10) + (df_sample['col3']//10 * df_sample['col3']%10) + (df_sample['col4']//10 * df_sample['col4']%10) + (df_sample['col5']//10 * df_sample['col5']%10))  / (df_sample['col2']//10 + df_sample['col3']//10 + df_sample['col4']//10 + df_sample['col5']//10)).fillna(0))
100 loops, best of 3: 2.03 ms per loop

In [115]: %timeit (((df_sample//10 * df_sample%10).sum(axis=1).div((df_sample//10).sum(axis=1))).fillna(0))
1000 loops, best of 3: 897 ┬Ás per loop