Brian Brian - 29 days ago 9
Python Question

Multiply every three rows of dataframe by different value

I have a dataframe with 9 rows. I want to multiply the first three rows by one value, the second three rows by a second value and the third set of 3 rows by yet another value.

I'm using these variables:

import pandas as pd

df = pd.DataFrame([[i] * 5 for i in range(9)], columns=list('ABCDE'))

a = pd.Series(range(3))

print df

A B C D E
0 0 0 0 0 0
1 1 1 1 1 1
2 2 2 2 2 2
3 3 3 3 3 3
4 4 4 4 4 4
5 5 5 5 5 5
6 6 6 6 6 6
7 7 7 7 7 7
8 8 8 8 8 8


I was able to get it to work like this:

for i, e in a.iteritems():
start, end = i * len(a), (i + 1) * len(a)
df.iloc[start:end] *= e

print df

A B C D E
0 0 0 0 0 0
1 0 0 0 0 0
2 0 0 0 0 0
3 3 3 3 3 3
4 4 4 4 4 4
5 5 5 5 5 5
6 12 12 12 12 12
7 14 14 14 14 14
8 16 16 16 16 16

Answer

Another solution multiple df by mul with numpy array expanded by numpy.repeat:

print (df.mul(np.repeat(a.index.values, [3] * len(a)), axis=0))
    A   B   C   D   E
0   0   0   0   0   0
1   0   0   0   0   0
2   0   0   0   0   0
3   3   3   3   3   3
4   4   4   4   4   4
5   5   5   5   5   5
6  12  12  12  12  12
7  14  14  14  14  14
8  16  16  16  16  16

Timings - (len(df)=9):

In [20]: %timeit (df.mul(np.repeat(a.index.values, [3] * len(a)), axis=0))
The slowest run took 6.12 times longer than the fastest. This could mean that an intermediate result is being cached.
1000 loops, best of 3: 197 µs per loop

In [21]: %%timeit 
    ...: df.loc[:, :] = (df.values.reshape(3, df.size / 3) * np.arange(3)[:, None]).reshape(df.shape)

__main__:257: DeprecationWarning: using a non-integer number instead of an integer will result in an error in the future
The slowest run took 6.16 times longer than the fastest. This could mean that an intermediate result is being cached.
1000 loops, best of 3: 199 µs per loop

Code for timings - (len(df)=90k):

df = pd.DataFrame([[i] * 5 for i in range(9)], columns=list('ABCDE'))
df = pd.concat([df]*10000).reset_index(drop=True)
a = pd.Series(range(3000))
print (df)

Timings - (len(df)=90k):

In [24]: %timeit (df.mul(np.repeat(a.index.values, [3] * len(a)), axis=0))
100 loops, best of 3: 3.58 ms per loop

In [33]: %%timeit
    ...: df.loc[:, :] = (df.values.reshape(3, df.size / 3) * np.arange(3)[:, None]).reshape(df.shape)
    ...: 
__main__:257: DeprecationWarning: using a non-integer number instead of an integer will result in an error in the future
100 loops, best of 3: 10.9 ms per loop

In [34]: %%timeit
    ...: df.iloc[:, :] = (df.values.reshape(3, df.size / 3) * np.arange(3)[:, None]).reshape(df.shape)
    ...: 
__main__:257: DeprecationWarning: using a non-integer number instead of an integer will result in an error in the future
100 loops, best of 3: 10.9 ms per loop
Comments