Brian - 1 year ago 80
Python Question

# Multiply every three rows of dataframe by different value

I have a dataframe with 9 rows. I want to multiply the first three rows by one value, the second three rows by a second value and the third set of 3 rows by yet another value.

I'm using these variables:

``````import pandas as pd

df = pd.DataFrame([[i] * 5 for i in range(9)], columns=list('ABCDE'))

a = pd.Series(range(3))

print df

A  B  C  D  E
0  0  0  0  0  0
1  1  1  1  1  1
2  2  2  2  2  2
3  3  3  3  3  3
4  4  4  4  4  4
5  5  5  5  5  5
6  6  6  6  6  6
7  7  7  7  7  7
8  8  8  8  8  8
``````

I was able to get it to work like this:

``````for i, e in a.iteritems():
start, end = i * len(a), (i + 1) * len(a)
df.iloc[start:end] *= e

print df

A   B   C   D   E
0   0   0   0   0   0
1   0   0   0   0   0
2   0   0   0   0   0
3   3   3   3   3   3
4   4   4   4   4   4
5   5   5   5   5   5
6  12  12  12  12  12
7  14  14  14  14  14
8  16  16  16  16  16
``````

Another solution multiple `df` by `mul` with `numpy array` expanded by `numpy.repeat`:

``````print (df.mul(np.repeat(a.index.values, [3] * len(a)), axis=0))
A   B   C   D   E
0   0   0   0   0   0
1   0   0   0   0   0
2   0   0   0   0   0
3   3   3   3   3   3
4   4   4   4   4   4
5   5   5   5   5   5
6  12  12  12  12  12
7  14  14  14  14  14
8  16  16  16  16  16
``````

Timings - (`len(df)=9`):

``````In [20]: %timeit (df.mul(np.repeat(a.index.values, [3] * len(a)), axis=0))
The slowest run took 6.12 times longer than the fastest. This could mean that an intermediate result is being cached.
1000 loops, best of 3: 197 µs per loop

In [21]: %%timeit
...: df.loc[:, :] = (df.values.reshape(3, df.size / 3) * np.arange(3)[:, None]).reshape(df.shape)

__main__:257: DeprecationWarning: using a non-integer number instead of an integer will result in an error in the future
The slowest run took 6.16 times longer than the fastest. This could mean that an intermediate result is being cached.
1000 loops, best of 3: 199 µs per loop
``````

Code for timings - (`len(df)=90k`):

``````df = pd.DataFrame([[i] * 5 for i in range(9)], columns=list('ABCDE'))
df = pd.concat([df]*10000).reset_index(drop=True)
a = pd.Series(range(3000))
print (df)
``````

Timings - (`len(df)=90k`):

``````In [24]: %timeit (df.mul(np.repeat(a.index.values, [3] * len(a)), axis=0))
100 loops, best of 3: 3.58 ms per loop

In [33]: %%timeit
...: df.loc[:, :] = (df.values.reshape(3, df.size / 3) * np.arange(3)[:, None]).reshape(df.shape)
...:
__main__:257: DeprecationWarning: using a non-integer number instead of an integer will result in an error in the future
100 loops, best of 3: 10.9 ms per loop

In [34]: %%timeit
...: df.iloc[:, :] = (df.values.reshape(3, df.size / 3) * np.arange(3)[:, None]).reshape(df.shape)
...:
__main__:257: DeprecationWarning: using a non-integer number instead of an integer will result in an error in the future
100 loops, best of 3: 10.9 ms per loop
``````
Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download