cjm2671 cjm2671 - 1 year ago 92
Python Question

Advanced array/dataframe slicing (numpy/pandas)

I'm trying to generate 50 random samples of 30 continuous day periods from a list of corn prices (which is index by date).

So far I've got 'select 50 random days' on line one. For the second line, what I really want is an array of dataframes, each one containing 30 days from sample date. Currently it just returns the price on that day.

corn['Open'][samples] #line I need to fix

What's the cleanest way of doing that?

Answer Source

You could use


to select 30 days worth of rows starting from date date. To get an array of DataFrames, use a list comprehension:

dfs = [corn.loc[date:date+pd.Timedelta(days=30)] for date in samples]

import numpy as np
import pandas as pd

N = 365
corn = pd.DataFrame({'Open': np.random.random(N)}, 
                    index=pd.date_range('1980-1-1', periods=N))
samples = np.random.choice(corn[:'1981'].index,50)
dfs = [corn.loc[date:date+pd.Timedelta(days=30)] for date in samples]