Han Zhengzu - 2 years ago 133
Python Question

# Speed up multi-loop data computation in Pandas

Here is my question. Take the dataframe below as an example:

• The dataframe
df
has 8 columns, each of them has finite values.

• What I'm going to do:

• a. Loop over the dataframe by rows

• b. In each row, the value of column B1, B2, B3, B4, B5, B6 will be changed to B* x A

Code like this:

for i in range(0,len(df),1):
col_B = ["B1","B2","B3","B4","B5","B6",]
for j in range(len(col_B)):
df.[col_B[j]].iloc[i] = df.[col_B[j]].iloc[i]*df.A.iloc[i]

In my real data which contain 224 rows and 9 columns, to loop over all these cells cost me 0:01:03.

How to boost up the loop-over velocity in Pandas?

You can first filter DataFrame and then multiple by mul:

print(df.filter(like='B').mul(df.A, axis=0))

Sample:

import pandas as pd
import numpy as np

df = pd.DataFrame({'A':[1,2,3],
'B1':[4,5,6],
'B2':[7,8,9],
'B3':[1,3,5],
'B4':[5,3,6],
'B5':[7,4,3],
'B6':[1,3,7]})

print (df)
A  B1  B2  B3  B4  B5  B6
0  1   4   7   1   5   7   1
1  2   5   8   3   3   4   3
2  3   6   9   5   6   3   7

print(df.filter(like='B').mul(df.A, axis=0))
B1  B2  B3  B4  B5  B6
0   4   7   1   5   7   1
1  10  16   6   6   8   6
2  18  27  15  18   9  21

If need column A use concat:

print (pd.concat([df.A, df.filter(like='B').mul(df.A, axis=0)], axis=1))
A  B1  B2  B3  B4  B5  B6
0  1   4   7   1   5   7   1
1  2  10  16   6   6   8   6
2  3  18  27  15  18   9  21

Timings:

len(df)=3:

In [416]: %timeit (pd.concat([df.A, df.filter(like='B').mul(df.A, axis=0)], axis=1))
1000 loops, best of 3: 1.01 ms per loop

In [417]: %timeit loop(df)
100 loops, best of 3: 3.28 ms per loop

len(df)=30k:

In [420]: %timeit (pd.concat([df.A, df.filter(like='B').mul(df.A, axis=0)], axis=1))
The slowest run took 4.00 times longer than the fastest. This could mean that an intermediate result is being cached.
100 loops, best of 3: 3 ms per loop

In [421]: %timeit loop(df)
1 loop, best of 3: 35.6 s per loop

Code for timings:

import pandas as pd

df = pd.DataFrame({'A':[1,2,3],
'B1':[4,5,6],
'B2':[7,8,9],
'B3':[1,3,5],
'B4':[5,3,6],
'B5':[7,4,3],
'B6':[1,3,7]})

print (df)

df = pd.concat([df]*10000).reset_index(drop=True)

print (pd.concat([df.A, df.filter(like='B').mul(df.A, axis=0)], axis=1))

def loop(df):
for i in range(0,len(df),1):
col_B = ["B1","B2","B3","B4","B5","B6",]
for j in range(len(col_B)):
df[col_B[j]].iloc[i] = df[col_B[j]].iloc[i]*df.A.iloc[i]
return df

print (loop(df))
Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download