user2242044 user2242044 - 6 months ago 52
Python Question

Calculating Elapsed Days From Pandas Dataframe Strings

I have a

Pandas
dataframe
that stores travel dates of people. I'd like to add a column that shows the length of the stay. To do this the
string
needs to be parsed, converted to a
datetime
and subtracted.
Pandas
seems to be treating the
datetime
conversion as a whole series and not individual
strings
as a I get
TypeError: must be string, not Series
. I like to do this with a non-looping option as the actual dataset is quite large, but need a bit of help.

import pandas as pd
from datetime import datetime

df = pd.DataFrame(data=[['Bob', '12 Mar 2015 - 31 Mar 2015'], ['Jessica', '27 Mar 2015 - 31 Mar 2015']], columns=['Names', 'Day of Visit'])
df['Length of Stay'] = (datetime.strptime(df['Day of Visit'][:11], '%d %b %Y') - datetime.strptime(df['Day of Visit'][-11:], '%d %b %Y')).days + 1
print df


Desired Output:

Names Day of Visit Length of Stay
0 Bob 12 Mar 2015 - 31 Mar 2015 20
1 Jessica 27 Mar 2015 - 31 Mar 2015 5

Answer

Use Series.str.extract to split the Day of Visit column into two separate columns. Then use pd.to_datetime to parse the columns as dates. Computing the Length of Stay can then be done by subtracting the date columns and adding 1:

import numpy as np
import pandas as pd

df = pd.DataFrame(data=[['Bob', '12 Mar 2015 - 31 Mar 2015'], ['Jessica', '27 Mar 2015 - 31 Mar 2015']], columns=['Names', 'Day of Visit'])
tmp = df['Day of Visit'].str.extract(r'([^-]+)-(.*)', expand=True).apply(pd.to_datetime)
df['Length of Stay'] = (tmp[1] - tmp[0]).dt.days + 1
print(df)

yields

     Names               Day of Visit  Length of Stay
0      Bob  12 Mar 2015 - 31 Mar 2015              20
1  Jessica  27 Mar 2015 - 31 Mar 2015               5

The regex pattern ([^-]+)-(.*) means

(              # start group #1
  [            # begin character class
    ^-         # any character except a literal minus sign `-`
  ]            # end character class 
   +           # match 1-or-more characters from the character class
)              # end group #1
-              # match a literal minus sign 
(              # start group #2
  .*           # match 0-or-more of any character
)              # end group #2

.str.extract returns a DataFrame with the matching text from groups #1 and #2 in columns.

Comments