sschade sschade - 29 days ago 12
Python Question

Average of Groupby into a dataframe timedelta64[ns]

df1

|Project |Days
|A |20 days
|B |10 days
|A |10 days
|C |5 days
|C |7 days
|B |8 days


R = df1['Days'].groupby(df1['Project'])

R

|20 days
|10 days
|Name: Days, dtype: timedelta64[ns],('A', 30 15 days)
|10 days
|8 days
|Name: Days, dtype: timedelta64[ns],('B', 18 9 days)
|5 days
|7 days
|Name: Days, dtype: timedelta64[ns],('C', 12 6 days)


df2

|Project |Date |**New Date**
|A |1/10/16 |1/25/16
|A |1/8/16 |1/23/16
|C |1/2/16 |1/8/16
|B |1/9/16 |1/18/16


What I'm trying to do is create
df2['New Date']
by using the average of days by Project in df1 and add that average to
df2['Date']
. Any thoughts?

Also wanted to add. The 'Days' column is generated from taking the difference between two dates loaded in from an excel spreadsheet.

****EDIT****

df1.head().to_dict('list')

{'Project': ['210001', '210001', '210001', '210001', '210001'], 'Days':
[Timedelta('8 days 00:00:00'), Timedelta('8 days 00:00:00'), Timedelta('12 days
00:00:00'), Timedelta('12 days 00:00:00'), Timedelta('14 days 00:00:00')]}


df1.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 1161 entries, 0 to 1278
Data columns (total 2 columns):
Project 1161 non-null object
Days 1161 non-null timedelta64[ns]
dtypes: object(1), timedelta64[ns](1)
memory usage: 22.7+ KB
None


EDIT #2 INT
This is the error I'm getting on: OverflowError: int too big to convert

df2['New Date'] = df2['Date'] + pd.to_timedelta(df2['Days'], unit='D')

Days
20.569231
15.795455
20.569231

Answer
import pandas as pd
df1 = pd.DataFrame(
    {'Days': ['20 days', '10 days', '10 days', '5 days', '7 days', '8 days'],
     'Project': ['A', 'B', 'A', 'C', 'C', 'B']}) 

df2 = pd.DataFrame(
    {'Date': ['1/10/16', '1/8/16', '1/2/16', '1/9/16'],
     'Project': ['A', 'A', 'C', 'B']})

df1['Days'] = pd.to_timedelta(df1['Days']) 
df2['Date'] = pd.to_datetime(df2['Date'])

result = df1.groupby('Project')['Days'].agg(['sum', 'count'])
result['Days'] = result['sum']/result['count']
df2 = pd.merge(df2, result[['Days']], left_on='Project', right_index=True)
df2['New Date'] = df2['Date'] + df2['Days']
print(df2)

yields

        Date Project  Days   New Date
0 2016-01-10       A  15.0 2016-01-25
1 2016-01-08       A  15.0 2016-01-23
2 2016-01-02       C   6.0 2016-01-08
3 2016-01-09       B   9.0 2016-01-18

Compute the groupby/mean:

result = df1.groupby('Project')['Days'].agg(['sum', 'count'])
result['Days'] = result['sum']/result['count']
#             sum  count    Days
# Project                       
# A       30 days      2 15 days
# B       18 days      2  9 days
# C       12 days      2  6 days

and merge this result with df2 (joining on Project):

df2 = pd.merge(df2, result[['Days']], left_on='Project', right_index=True)
#         Date Project    Days
# 0 2016-01-10       A 15 days
# 1 2016-01-08       A 15 days
# 2 2016-01-02       C  6 days
# 3 2016-01-09       B  9 days

Then, add the Days to the Date:

df2['New Date'] = df2['Date'] + df2['Days']