sascha sascha - 3 months ago 16
Python Question

Select Days from Pandas DataFrame

I have a Pandas DataFrame like this:


╔════════════╦═══════╗
║ DATE ║ VALUE ║
╠════════════╬═══════╣
║ 2011-01-07 ║ 1 ║
╠════════════╬═══════╣
║ 2011-01-08 ║ 2 ║
╠════════════╬═══════╣
║ 2011-01-09 ║ 1 ║
╠════════════╬═══════╣
║ 2011-01-10 ║ 1 ║
╠════════════╬═══════╣
║ 2011-01-20 ║ 1 ║
╠════════════╬═══════╣
║ 2011-01-20 ║ 1 ║
╚════════════╩═══════╝


What I want do do now is to select three days starting from 2011-01-20. Selecting via
df.loc['2011-01-20' - pd.Timedelta(3, unit='d'):'2011-01-20']
results in the following date frame:


╔════════════╦═══════╗
║ DATE ║ VALUE ║
╠════════════╬═══════╣
║ 2011-01-20 ║ 1 ║
╠════════════╬═══════╣
║ 2011-01-20 ║ 1 ║
╚════════════╩═══════╝


What I want to accomplish is the following data frame:


╔════════════╦═══════╗
║ DATE ║ VALUE ║
╠════════════╬═══════╣
║ 2011-01-09 ║ 1 ║
╠════════════╬═══════╣
║ 2011-01-10 ║ 1 ║
╠════════════╬═══════╣
║ 2011-01-20 ║ 1 ║
╠════════════╬═══════╣
║ 2011-01-20 ║ 1 ║
╚════════════╩═══════╝


What I don't want to do is to
groupby
or resample the data frame or anything like that because I need to preserve the structure for the following processing. Does anybody know how I can solve this problem? Thanks in advance!

Answer

You can create a consecutive id column so that each date has a unique id which increases with the date and then subset based on the id column:

import pandas as pd
# sort the `DATE` column and create an id for each date
df['DATE'] = pd.to_datetime(df.DATE).sort_values()
df['DateId'] = df.groupby('DATE').grouper.group_info[0]

# find out the id for the target date
MaxId = df.DateId[df.DATE == '2011-01-20'].drop_duplicates().values

# subset based on the id column and the MaxId
df.loc[df.DateId.isin(range(MaxId - 2, MaxId + 1)),['DATE', 'VALUE']]

#         DATE  VALUE
# 2 2011-01-09      1
# 3 2011-01-10      1
# 4 2011-01-20      1
# 5 2011-01-20      1
Comments