Python Question

# Average of Groupby into a dataframe timedelta64[ns]

df1

``````|Project |Days
|A       |20 days
|B       |10 days
|A       |10 days
|C       |5 days
|C       |7 days
|B       |8 days
``````

R = df1['Days'].groupby(df1['Project'])

R

``````|20 days
|10 days
|Name: Days, dtype: timedelta64[ns],('A', 30   15 days)
|10 days
|8 days
|Name: Days, dtype: timedelta64[ns],('B', 18   9 days)
|5 days
|7 days
|Name: Days, dtype: timedelta64[ns],('C', 12   6 days)
``````

df2

``````|Project  |Date     |**New Date**
|A        |1/10/16  |1/25/16
|A        |1/8/16   |1/23/16
|C        |1/2/16   |1/8/16
|B        |1/9/16   |1/18/16
``````

What I'm trying to do is create
`df2['New Date']`
by using the average of days by Project in df1 and add that average to
`df2['Date']`
. Any thoughts?

Also wanted to add. The 'Days' column is generated from taking the difference between two dates loaded in from an excel spreadsheet.

****EDIT****

`````` {'Project': ['210001', '210001', '210001', '210001', '210001'], 'Days':
[Timedelta('8 days 00:00:00'), Timedelta('8 days 00:00:00'), Timedelta('12 days
00:00:00'), Timedelta('12 days 00:00:00'), Timedelta('14 days 00:00:00')]}
``````

df1.info()

``````<class 'pandas.core.frame.DataFrame'>
Int64Index: 1161 entries, 0 to 1278
Data columns (total 2 columns):
Project    1161 non-null object
Days      1161 non-null timedelta64[ns]
dtypes: object(1), timedelta64[ns](1)
memory usage: 22.7+ KB
None
``````

EDIT #2 INT
This is the error I'm getting on: OverflowError: int too big to convert

``````df2['New Date'] = df2['Date'] + pd.to_timedelta(df2['Days'], unit='D')

Days
20.569231
15.795455
20.569231
``````

``````import pandas as pd
df1 = pd.DataFrame(
{'Days': ['20 days', '10 days', '10 days', '5 days', '7 days', '8 days'],
'Project': ['A', 'B', 'A', 'C', 'C', 'B']})

df2 = pd.DataFrame(
{'Date': ['1/10/16', '1/8/16', '1/2/16', '1/9/16'],
'Project': ['A', 'A', 'C', 'B']})

df1['Days'] = pd.to_timedelta(df1['Days'])
df2['Date'] = pd.to_datetime(df2['Date'])

result = df1.groupby('Project')['Days'].agg(['sum', 'count'])
result['Days'] = result['sum']/result['count']
df2 = pd.merge(df2, result[['Days']], left_on='Project', right_index=True)
df2['New Date'] = df2['Date'] + df2['Days']
print(df2)
``````

yields

``````        Date Project  Days   New Date
0 2016-01-10       A  15.0 2016-01-25
1 2016-01-08       A  15.0 2016-01-23
2 2016-01-02       C   6.0 2016-01-08
3 2016-01-09       B   9.0 2016-01-18
``````

Compute the `groupby/mean`:

``````result = df1.groupby('Project')['Days'].agg(['sum', 'count'])
result['Days'] = result['sum']/result['count']
#             sum  count    Days
# Project
# A       30 days      2 15 days
# B       18 days      2  9 days
# C       12 days      2  6 days
``````

and merge this `result` with `df2` (joining on `Project`):

``````df2 = pd.merge(df2, result[['Days']], left_on='Project', right_index=True)
#         Date Project    Days
# 0 2016-01-10       A 15 days
# 1 2016-01-08       A 15 days
# 2 2016-01-02       C  6 days
# 3 2016-01-09       B  9 days
``````

Then, add the `Days` to the `Date`:

``````df2['New Date'] = df2['Date'] + df2['Days']
``````
Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download