Georg Heiler Georg Heiler - 1 year ago 152
Scala Question

Spark time difference

From spark 1.5.x on some convenience methods were added to deal with time.$"A", $"B")).show

But this returns the difference in days. Is there any option to convert it to a different interval e.g. years?

Manually I used$"A")- year($"B")).show

which is good enough. But it would be better if the date difference was calculated more accurately e.g. including months / days like displayed

Is a UDF the go-to solution for current spark 2.0.2? or Am I missing an option to convert the difference as days to e.g. years?

Answer Source

Timestamp type can be converted to its the Unix timestamp in seconds. If this resolution is good enough for you the rest is just a bunch of arithmetic operations. You can compute difference in seconds as:

import org.apache.spark.sql.Column

def secondsBetween(col1: Column, col2: Column) =
  col2.cast("timestamp").cast("bigint") - col1.cast("timestamp").cast("bigint")

and re-sample according to your needs, for example:

def minutesBetween(col1: Column, col2: Column) =
  (secondsBetween(col1, col2) / 60).cast("bigint")

Obviously this doesn't handle all the subtleties of date and time processing like daylight saving time or leap second so if these are important in your domain then UDF with proper time processing library is recommended.

Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download