Mike Cuddy Mike Cuddy - 1 year ago 90
Python Question

Pandas/python and working with a column, in a dataframe, with a date

I am currently working on a Python/Pandas data science project for fun. The data that I am looking at has a Date column where the date looks like the following: 2016-07-16. The data type is also an object. What I want to do is go through each date and pull data from across that row. Now, some rows may have the same date because two separate attacks occurred on that date. (I am looking at terrorism data.) What I currently have done is the following:

dates = []
start = 0;
while start < 300:
date = data.iat[start, 1]
start += 1

This will get me ALMOST what I want. However, I have two problems, the start variable is set to 0 but I cannot go to 365 since, like I said, each date may have multiple attacks. So one year may have like 400 attacks. Is there a way that I could end the data collection at 2016-12-31 or 2017-01-01 for example? Basically, is there a way to quickly determine the number of attacks, per year for year after year? Thank you for any help!

Oh I will say that I was trying something like:

newDate = pd.to_datetime(startdate) + pd.DateOffset(days=1)


data['Date']) + timedelta(days=1)

to add one to the date to end at the year. Not getting what I wanted plus, there could be more than one entry per day.

to explain further I could have something like this:

Date Deaths Country
2002-01-01 2 India
2002-01-02 0 Pakistan
2001-01-02 1 France

The data has about 20,000 points and I need to find a way to stop it at the end of each year. That is my main issue. I cannot go to 365 because there may be multiple terrorist attacks on the same date around the world.

Answer Source

IMO there is no need to add a new column:

In [132]: df
        Date  Deaths   Country
0 2002-01-01       2     India
1 2002-01-02       0  Pakistan
2 2001-01-02       1    France

In [134]: df.groupby(df.Date.dt.year).size()
2001    1
2002    2
dtype: int64


In [135]: df.groupby(pd.TimeGrouper(freq='AS', key='Date')).size()
2001-01-01    1
2002-01-01    2
Freq: AS-JAN, dtype: int64

In [133]: df.groupby(pd.TimeGrouper(freq='A', key='Date')).size()
2001-12-31    1
2002-12-31    2
Freq: A-DEC, dtype: int64

and you can always access different parts (year, month, day, weekday, hour, etc.) of your DateTime column:

In [137]: df.Date.dt.year
0    2002
1    2002
2    2001
Name: Date, dtype: int64

In [138]: df.Date.dt.
df.Date.dt.ceil             df.Date.dt.freq             df.Date.dt.microsecond      df.Date.dt.strftime         df.Date.dt.weekday
df.Date.dt.date             df.Date.dt.hour             df.Date.dt.minute           df.Date.dt.time             df.Date.dt.weekday_name
df.Date.dt.day              df.Date.dt.is_month_end     df.Date.dt.month            df.Date.dt.to_period        df.Date.dt.weekofyear
df.Date.dt.dayofweek        df.Date.dt.is_month_start   df.Date.dt.nanosecond       df.Date.dt.to_pydatetime    df.Date.dt.year
df.Date.dt.dayofyear        df.Date.dt.is_quarter_end   df.Date.dt.normalize        df.Date.dt.tz
df.Date.dt.days_in_month    df.Date.dt.is_quarter_start df.Date.dt.quarter          df.Date.dt.tz_convert
df.Date.dt.daysinmonth      df.Date.dt.is_year_end      df.Date.dt.round            df.Date.dt.tz_localize
df.Date.dt.floor            df.Date.dt.is_year_start    df.Date.dt.second           df.Date.dt.week