Dmitry Polonskiy Dmitry Polonskiy - 3 months ago 21
Python Question

Reducing a DateTime Object in PySpark

I have two DFs. One has datetime as, 2, 1)
and another one which has datetime as
pickup_time=datetime.datetime(2014, 2, 9, 14, 51)
. The problem is that I am unable to join the two DataFrames due to the fact that one has the hour/minutes/seconds so PySpark is unable to join them due to that. Is the correct method to reformat the datetime in the dataframe with the extra time format, or is there a way to join the DataFrames which disregards the hours/minutes/seconds. How would I go about doing this?


You can cast types during join, for example:

>>> df1.first();
Row(, 11, 11))
>>> df2.first();
Row(date=datetime.datetime(2016, 11, 11, 21, 8))
>>> df1.join(df2, =='date')).first()
Row(, 11, 11), date=datetime.datetime(2016, 11, 11, 21, 8))