Queen Queen - 24 days ago 12
Scala Question

compare the timestamp with a specific date in Spark and Scala

I have the following dataframe: (The name of dataframe is:

Dataframe_add_rank_count_xi_pi_final_chi_square
)

+---------------+-----------+-------------+------+----+-----+--------------------+-------------------+------+------+------+-----+--------------------+--------------------+------------------+------------------+------+
| time_stamp_0|sender_ip_1|receiver_ip_2| count|rank| xi| pi| r| ip5| ip4| ip3| ip2| variance| entropy| pre_chi_square| total_chi_square|attack|
+---------------+-----------+-------------+------+----+-----+--------------------+-------------------+------+------+------+-----+--------------------+--------------------+------------------+------------------+------+
|07:19:00.005763| 10.0.0.2| 10.0.0.1|509286| 1|92055|1.963533260289896E-6|0.18075305427598637|111891|115199|190028|92055|1.317855896447428...|2.580232918985576E-5|3.7131630265751565|14.852652106300626| 1|
|07:19:00.005788| 10.0.0.2| 10.0.0.1|509286| 2|92055|3.927066520579792E-6|0.18075305427598637|111891|115199|190028|92055|6.498626409377348E-6|4.888262329310028E-5|18.310392943472664|14.852652106300626| 1|
|07:19:00.005807| 10.0.0.2| 10.0.0.1|509286| 3|92055|5.890599780869688E-6|0.18075305427598637|111891|115199|190028|92055|1.560646344288706E-5|7.093550226267817E-5| 43.9724428049685|14.852652106300626| 1|


I need to put the zero value for
attack
field if, both the time stamp is bigger than "07:19:00.005788" and the
sender_ip_1
is equal to 10.0.0.3.

However, I don't know how to deal with timestamp comparison with a specific datae in a condition in scala. Here is my code:

val final_add_count_rank_xi_pi_r_attack = Dataframe_add_rank_count_xi_pi_final_chi_square
.withColumn("attack",
when($"sender_ip_1" === "10.0.0.3"
and ($"time_stamp_0").cast(TimestampType) > "07:19:00.005788", 0)
.otherwise(1))


Can any body helps me?

Answer Source

Simple lexicographic comparison also works here for column time_stamp_0 as well.

import org.apache.spark.sql.functions._
import spark.implicits._

val final_add_count_rank_xi_pi_r_attack = Dataframe_add_rank_count_xi_pi_final_chi_square
  .withColumn("attack",
    when($"sender_ip_1" === "10.0.0.3"
      && $"time_stamp_0" > "07:19:00.005788", 0)
      .otherwise(1))