nilesh1212 nilesh1212 - 4 years ago 208
reST (reStructuredText) Question

RestAPI service call from Spark Streaming

I have a use case where I need to call RESTAPI from spark streaming after messages are read from Kafka to perform some calculation and save back the result to HDFS and third party application.

I have few doubts here:

  • How can we call RESTAPI directly from the spark streaming.

  • How to manage RESTAPI timeout with streaming batch time.

Answer Source

This code will not compile as it is. But this the approach for the given usecase.

val conf = new SparkConf().setAppName("App name").setMaster("yarn")
val ssc = new StreamingContext(conf, Seconds(1))

val dstream = KafkaUtils.createStream(ssc, zkQuorum, group, topicMap)

dstream.foreachRDD { rdd =>

  //Write the rdd to HDFS directly

  //loop through each parttion in rdd
  rdd.foreachPartition { partitionOfRecords =>

    //1. Create HttpClient object here
    //2.a POST data to API

    //Use it if you want record level control in rdd or partion
    partitionOfRecords.foreach { record =>
      //2.b Post the the date to API
  //Use 2.a or 2.b to POST data as per your req


Most of the HttpClients (for REST call) supports request timeout.

Sample Http POST call with timeout using Apache HttpClient

val CONNECTION_TIMEOUT_MS = 20000; // Timeout in millis (20 sec).

val requestConfig = RequestConfig.custom()


val client: CloseableHttpClient = HttpClientBuilder.create().build();

val url = ""
val post = new HttpPost(url);

//Set config to post

post.setEntity(EntityBuilder.create.setText("some text to post to API").build())

val response: HttpResponse = client.execute(post)
Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download