cjm2671 cjm2671 - 7 months ago 30
Python Question

Advanced array/dataframe slicing (numpy/pandas)

I'm trying to generate 50 random samples of 30 continuous day periods from a list of corn prices (which is index by date).

So far I've got 'select 50 random days' on line one. For the second line, what I really want is an array of dataframes, each one containing 30 days from sample date. Currently it just returns the price on that day.

samples=np.random.choice(corn[:'1981'].index,50)
corn['Open'][samples] #line I need to fix


What's the cleanest way of doing that?

Answer

You could use

corn.loc[date:date+pd.Timedelta(days=30)]

to select 30 days worth of rows starting from date date. To get an array of DataFrames, use a list comprehension:

dfs = [corn.loc[date:date+pd.Timedelta(days=30)] for date in samples]

import numpy as np
import pandas as pd

N = 365
corn = pd.DataFrame({'Open': np.random.random(N)}, 
                    index=pd.date_range('1980-1-1', periods=N))
samples = np.random.choice(corn[:'1981'].index,50)
dfs = [corn.loc[date:date+pd.Timedelta(days=30)] for date in samples]
Comments