Bio Bio - 4 months ago 19
Python Question

Split timestamp column into two seperate date and time columns with python

I have a dataset called "df_no_missing".

df_no_missing.head()



TIMESTAMP object
P_ACT_KW float64
PERIODE_TARIF object
P_SOUSCR float64
SITE object
TARIF object
depassement float64
dtype: object



I try to extract date and time into two different columns from the timestamp column, so I did :

dt = datetime.strptime('TIMESTAMP', '%d/%m/%y %H:%M')
df_no_missing['date'] = df_no_missing['TIMESTAMP'].dt.date
df_no_missing['time'] = df_no_missing['TIMESTAMP'].dt.time


But I got an error :

> ValueError Traceback (most recent call
> last) <ipython-input-185-6599284ba17f> in <module>()
> 1 print(df_no_missing.dtypes)
> 2 df_no_missing.head()
> ----> 3 dt = datetime.strptime('TIMESTAMP', '%d/%m/%y %H:%M')
> 4 df_no_missing['date'] = df_no_missing['TIMESTAMP'].dt.date
> 5 df_no_missing['time'] = df_no_missing['TIMESTAMP'].dt.time
>
> C:\Users\Demonstrator\Anaconda3\lib\_strptime.py in
> _strptime_datetime(cls, data_string, format)
> 508 """Return a class cls instance based on the input string and the
> 509 format string."""
> --> 510 tt, fraction = _strptime(data_string, format)
> 511 tzname, gmtoff = tt[-2:]
> 512 args = tt[:6] + (fraction,)
>
> C:\Users\Demonstrator\Anaconda3\lib\_strptime.py in
> _strptime(data_string, format)
> 341 if not found:
> 342 raise ValueError("time data %r does not match format %r" %
> --> 343 (data_string, format))
> 344 if len(data_string) != found.end():
> 345 raise ValueError("unconverted data remains: %s" %
>
> ValueError: time data 'TIMESTAMP' does not match format '%d/%m/%y
> %H:%M'


Here is the csv file :

TIMESTAMP;P_ACT_KW;PERIODE_TARIF;P_SOUSCR;SITE;TARIF
31/07/2015 23:00;12;HC;;ST GEREON;TURPE_HTA5
31/07/2015 23:10;466;HC;425;ST GEREON;TURPE_HTA5
31/07/2015 23:20;18;HC;425;ST GEREON;TURPE_HTA5
31/07/2015 23:30;17;HC;425;ST GEREON;TURPE_HTA5
31/07/2015 23:40;13;HC;425;ST GEREON;TURPE_HTA5
31/07/2015 23:50;13;HC;425;ST GEREON;TURPE_HTA5
01/08/2015 00:00;13;HC;425;ST GEREON;TURPE_HTA5
01/08/2015 00:10;14;HC;425;ST GEREON;TURPE_HTA5
01/08/2015 00:20;13;HC;425;ST GEREON;TURPE_HTA5
01/08/2015 00:30;20;HC;425;ST GEREON;TURPE_HTA5


Any idea to help me please?

Thank you in advance

Best regrads

Answer

IIUC you want:

df_no_missing['TIMESTAMP'] = pd.to_datetime(df_no_missin['TIMESTAMP'], '%d/%m/%y %H:%M')

then you can do .dt.time and dt.date after the conversion

Also you need to post what the datetime strings look like

EDIT

You can tell read_csv to just parse your datestrings on loading:

In [42]:
import pandas as pd
import io
t="""TIMESTAMP;P_ACT_KW;PERIODE_TARIF;P_SOUSCR;SITE;TARIF
31/07/2015 23:00;12;HC;;ST GEREON;TURPE_HTA5
31/07/2015 23:10;466;HC;425;ST GEREON;TURPE_HTA5
31/07/2015 23:20;18;HC;425;ST GEREON;TURPE_HTA5
31/07/2015 23:30;17;HC;425;ST GEREON;TURPE_HTA5
31/07/2015 23:40;13;HC;425;ST GEREON;TURPE_HTA5
31/07/2015 23:50;13;HC;425;ST GEREON;TURPE_HTA5
01/08/2015 00:00;13;HC;425;ST GEREON;TURPE_HTA5
01/08/2015 00:10;14;HC;425;ST GEREON;TURPE_HTA5
01/08/2015 00:20;13;HC;425;ST GEREON;TURPE_HTA5
01/08/2015 00:30;20;HC;425;ST GEREON;TURPE_HTA5"""
df = pd.read_csv(io.StringIO(t), sep=';', parse_dates=[0])
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10 entries, 0 to 9
Data columns (total 6 columns):
TIMESTAMP        10 non-null datetime64[ns]
P_ACT_KW         10 non-null int64
PERIODE_TARIF    10 non-null object
P_SOUSCR         9 non-null float64
SITE             10 non-null object
TARIF            10 non-null object
dtypes: datetime64[ns](1), float64(1), int64(1), object(3)
memory usage: 560.0+ bytes

So in your case:

df = pd.read_csv(your_file, sep=';', parse_dates=[0])

should just work

Comments