Justin Justin - 1 year ago 46
Python Question

How to identify consecutive dates

I would like to identify dates in a dataframe which are consecutive, that is there exists either an immediate predecessor or successor. I would then like to mark which dates are and are not consecutive in a new column. Additionally I would like to do this operation within particular subsets of my data.

First I create a new variable where I'd identify True of False for Consecutive Days.

weatherFile['CONSECUTIVE_DAY'] = 'NA'

I've converted dates into datetime objects then to ordinal ones:

weatherFile['DATE_OBJ'] = [datetime.strptime(d, '%Y%m%d') for d in weatherFile['DATE']]
weatherFile['DATE_INT'] = list([d.toordinal() for d in weatherFile['DATE_OBJ']])

Now I would like to identify consecutive dates in the following groups:

weatherFile.groupby(['COUNTY_GEOID_YEAR', 'TEMPBIN'])

I am thinking to loop through the groups and applying an operation that will identify which days are consecutive and which are not, within unique county, tempbin subsets.

I'm rather new to programming and python, is this a good approach so far, if so how can I progress?

Thank you - Let me know if I should provide additional information.


Using @karakfa advice I tried the following:

weatherFile.groupby(['COUNTY_GEOID_YEAR', 'TEMPBIN'])
weatherFile['DISTANCE'] = weatherFile[1:, 'DATE_INT'] - weatherFile[:-1,'DATE_INT']
weatherFile['CONSECUTIVE?'] = np.logical_or(np.insert((weatherFile['DISTANCE']),0,0) == 1, np.append((weatherFile['DISTANCE']),0) == 1)

This resulting in a TypeError: unhashable type. Traceback happened in the second line. weatherFile['DATE_INT'] is dtype: int64.

Answer Source

You can use .shift(-1) or .shift(1) to compare consecutive entries:

df.loc[df['DATE_INT'].shift(-1) - df['DATE_INT'] == 1, 'CONSECUTIVE_DAY'] = True

Will set CONSECUTIVE_DAY to TRUE if the previous entry is the previous day

df.loc[(df['DATE_INT'].shift(-1) - df['DATE_INT'] == 1) | (df['DATE_INT'].shift(1) - df['DATE_INT'] == -1), 'CONSECUTIVE_DAY'] = True

Will set CONSECUTIVE_DAY to TRUE if the entry is preceeded by or followed by a consecutive date.