Tobias Gent Tobias Gent - 6 days ago 5
HTTP Question

How would I implement a HTTP Spout for Apache Storm in Java?

I recently started with Apache Storm and just finished building my first topologies (all in java).

As next step, I wanted to put sensor-values from a TI SensorTag, which is connected to an Raspberry Pi, in one of these topologies.

I'm able to send the sensor-data via HTTP, but I'm not sure how I would implement a working spout, which takes in these requests.

Idea of the topology: It should take in the HTTP-requests with the sensor value information, emit this data into the topology and write them into a file/database afterwards, using a bolt.

So far, i found a post on Stackoverflow, about an HTTP-Spout (Storm : Spout for reading data from a port), but sadly I was not allowed to leave a comment or write any private messages (Sorry if i missed something about that).
I'm not sure how this spout is working exactly and wanted to ask for an example-code.(basicly I wanted to know how the whole thing was setted up in the topology).

Also i tried to use the DRPC-function of Storm (http://storm.apache.org/releases/1.0.0/Distributed-RPC.html) to get my HTTP-requests into the topology, but I was'nt able to progress further through the documentation and storm-starter-examples so far, because im still learning how to use storm properly. I was really confused about setting up the drpc-server and how to configure the listening for the incoming requests.

So I wanted to know, if someone was also facing this problem and has found an solution or can give me advice, what else I could try.

Would such an HTTP-Spout (an socket connection, as far as I've seen?!) or an DRPC-server work?

ps: Also a code-template, other examples or any other sources of information, which could be helpful to understand about that topic would be nice!

Answer

I would instead write a servlet to consume those HTTP requests and, on receiving a request, write the relevant information to Kafka. You can then use the Kafka spout (I would write my own spout, but that's a whole different question) to read that data and emit it into your topology. The primary benefit to using Kafka as an intermediate staging location is the ability to replay your data by reseting the committed Kafka offsets.

Comments