wwl - 4 years ago 85
Python Question

# Pandas: create new column which swaps values of other rows

I'm trying to create a pandas dataframe like this:

``````           x2        x3
0    3.536220  0.681269
1    0.681269  3.536220
2   -0.402380  2.303833
3    2.303833 -0.402380
4    2.032329  3.334412
5    3.334412  2.032329
6    0.371338  5.879732
. . .
``````

So x2 is a column of random numbers, and x3 has the values of row 0 and 1 in x2 swapped, the values of 2 and 3 swapped, and so on. My current code is like this:

``````import numpy as np
import pandas as pd
x2 = pd.Series(np.random.normal(loc = 2, scale = 2.5, size = 1000))
x3 = pd.Series([x2[i + 1] if i % 2 == 0 else x2[i - 1] for i in range(1000)])
df = pd.DataFrame({'x2': x2, 'x3': x3})
``````

I'm wondering if there is any faster or more elegant way, particularly if I want to have many rows (e.g. 1 million?) or do this over and over again (e.g. Monte Carlo simulation)?

``````[x2[i + 1] if i % 2 == 0 else x2[i - 1] for i in range(1000)]
``````

you could use

``````def swap(arr):
result = np.empty_like(arr)
result[::2] = arr[1::2]
result[1::2] = arr[::2]
return result
``````

For a sequence of length 1000, using `swap` is over 3000x faster:

``````In [84]: %timeit [x2[i + 1] if i % 2 == 0 else x2[i - 1] for i in range(1000)]
100 loops, best of 3: 12.7 ms per loop

In [98]: %timeit  swap(x2.values)
100000 loops, best of 3: 3.82 µs per loop
``````

``````import numpy as np
import pandas as pd
np.random.seed(2017)
x2 = pd.Series(np.random.normal(loc = 2, scale = 2.5, size = 1000))
x3 = [x2[i + 1] if i % 2 == 0 else x2[i - 1] for i in range(1000)]

def swap(arr):
result = np.empty_like(arr)
result[::2] = arr[1::2]
result[1::2] = arr[::2]
return result

df = pd.DataFrame({'x2': x2, 'x3': x3, 'x4': swap(x2.values)})
``````         x2        x3        x4