nilesh1212 nilesh1212 - 17 days ago 8
JSON Question

Spark Streaming Sliding Window max and min

I am begineer to Spark; I am working on spark streaming use case where I receive a json messages each json message has an attribute 'value' which is double after parsing json I get a Array[Double].I want to find out max(value) and min (value) for last 15 sec with sliding window of 2 sec.
Here is my code.

val record = KafkaUtils.createStream[String, String, StringDecoder, StringDecoder](ssc, kafkaParams, topicMap, StorageLevel.MEMORY_ONLY_SER_2)
val lines=record.map(_._2)

val valueDtsream:DStream[Array[Double]]=lines.map { jsonRecord => parseJson(jsonRecord) }
.window(Seconds(15),Seconds(2))

valueDtsream.foreachRDD
{
rdd =>
if (!rdd.partitions.isEmpty)
{
//code to find min and max
}
}

ssc.start()
ssc.awaitTermination()

Answer

Try:

valueDtsream.transform( rdd => {
  val stats = rdd.flatMap(x => x).stats
  rdd.sparkContext.parallelize(Seq((stats.min, stats.max)))
})