Eugen Eugen - 29 days ago 18
reST (reStructuredText) Question

Spark Streaming REST Custom Receiver

Is it possible to use a REST API in a custom receiver for Spark Streaming?

I am trying to be able to do multiple calls / reads from that API asynchronously and use Spark Streaming to do it.

Answer

A custom receiver can be whatever process that produces data asynchronously. Typically, your def receive() method will send async requests to your REST server, maybe using using Futures and a dedicated ThreadPool. onCompletion of the future, we call the store(data) method to give the results to the Spark Streaming job. In a nutshell,

  • def onStart() => creates the process that manages the async request response handling
  • def receive() => continuously does the I/O and reports the results through calling store(...)
  • def onStop() => stops the process and cleans up what onStart creates.

There's an example in the custom receivers docs.