Gilberto - 1 year ago 260
Python Question

# How to get pandas dataframe where columns are the subsequent n-elements from another column dataframe?

A very simple example just for understanding.

I have the following pandas dataframe:

``````import pandas as pd
df = pd.DataFrame({'A':pd.Series([1, 2, 13, 14, 25, 26, 37, 38])})
df
A
0   1
1   2
2  13
3  14
4  25
5  26
6  37
8  38
``````

Set
`n = 3`

### First example

How to get a new dataframe
`df1`
(in an efficient way), like the following:

``````   D1  D2  D3     T
0   1   2  13    14
1   2  13  14    25
2  13  14  25    26
3  14  25  26    37
4  25  26  37    38
``````

Hint: think at the first n-columns as the data (Dx) and the last columns as the target (T). In the 1st example the target (e.g 25) depends on the preceding n-elements (2, 13, 14).

### Second example

What if the target is some element ahead (e.g.+3)?

``````   D1  D2  D3     T
0   1   2  13    26
1   2  13  14    37
2  13  14  25    38
``````

Gilberto

P.S. If you think that the title can be improved, please suggest me how to modify it.

Let's see how we can solve it with NumPy tools. So, let's imagine you have the column data as a NumPy array, let's call it `a`. For such sliding windowed operations, we have a very efficient tool in NumPy as `strides`, as they are `views` into the input array without actually making copies.

Let's directly use the methods with the sample data and start with case #1 -

``````In [29]: a  # Input data
Out[29]: array([ 1,  2, 13, 14, 25, 26, 37, 38])

In [30]: m = a.strides[0] # Get strides

In [31]: n = 3 # parameter

In [32]: nrows = a.size - n # Get number of rows in o/p

In [33]: a2D = np.lib.stride_tricks.as_strided(a,shape=(nrows,n+1),strides=(m,m))

In [34]: a2D
Out[34]:
array([[ 1,  2, 13, 14],
[ 2, 13, 14, 25],
[13, 14, 25, 26],
[14, 25, 26, 37],
[25, 26, 37, 38]])

In [35]: np.may_share_memory(a,a2D)
Out[35]: True    # a2D is a view into a
``````

Case #2 would be similar with an additional parameter for the `Target` column -

``````In [36]: n2 = 3 # Additional param

In [37]: nrows = a.size - n - n2 + 1

In [38]: part1 = np.lib.stride_tricks.as_strided(a,shape=(nrows,n),strides=(m,m))

In [39]: part1 # These are D1, D2, D3, etc.
Out[39]:
array([[ 1,  2, 13],
[ 2, 13, 14],
[13, 14, 25]])

In [43]: part2 = a[n+n2-1:] # This is target col

In [44]: part2
Out[44]: array([26, 37, 38])
``````
Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download