Gilberto - 1 year ago 138

Python Question

A very simple example just for understanding.

I have the following pandas dataframe:

`import pandas as pd`

df = pd.DataFrame({'A':pd.Series([1, 2, 13, 14, 25, 26, 37, 38])})

df

A

0 1

1 2

2 13

3 14

4 25

5 26

6 37

8 38

Set

`n = 3`

How to get a new dataframe

`df1`

`D1 D2 D3 T`

0 1 2 13 14

1 2 13 14 25

2 13 14 25 26

3 14 25 26 37

4 25 26 37 38

Hint: think at the first n-columns as the

What if the

`D1 D2 D3 T`

0 1 2 13 26

1 2 13 14 37

2 13 14 25 38

Thank you for your help,

Gilberto

P.S. If you think that the title can be improved, please suggest me how to modify it.

Answer Source

Let's see how we can solve it with NumPy tools. So, let's imagine you have the column data as a NumPy array, let's call it `a`

. For such sliding windowed operations, we have a very efficient tool in NumPy as `strides`

, as they are `views`

into the input array without actually making copies.

Let's directly use the methods with the sample data and start with case #1 -

```
In [29]: a # Input data
Out[29]: array([ 1, 2, 13, 14, 25, 26, 37, 38])
In [30]: m = a.strides[0] # Get strides
In [31]: n = 3 # parameter
In [32]: nrows = a.size - n # Get number of rows in o/p
In [33]: a2D = np.lib.stride_tricks.as_strided(a,shape=(nrows,n+1),strides=(m,m))
In [34]: a2D
Out[34]:
array([[ 1, 2, 13, 14],
[ 2, 13, 14, 25],
[13, 14, 25, 26],
[14, 25, 26, 37],
[25, 26, 37, 38]])
In [35]: np.may_share_memory(a,a2D)
Out[35]: True # a2D is a view into a
```

Case #2 would be similar with an additional parameter for the `Target`

column -

```
In [36]: n2 = 3 # Additional param
In [37]: nrows = a.size - n - n2 + 1
In [38]: part1 = np.lib.stride_tricks.as_strided(a,shape=(nrows,n),strides=(m,m))
In [39]: part1 # These are D1, D2, D3, etc.
Out[39]:
array([[ 1, 2, 13],
[ 2, 13, 14],
[13, 14, 25]])
In [43]: part2 = a[n+n2-1:] # This is target col
In [44]: part2
Out[44]: array([26, 37, 38])
```