dgnball dgnball - 2 months ago 11x
Java Question

Get JSON into Apache Spark from a web source in Java

I have a web server which returns JSON data that I would like to load into an Apache Spark DataFrame. Right now I have a shell script that uses wget to write the JSON data to file and then runs a Java program that looks something like this:

DataFrame df = sqlContext.read().json("example.json");

I have looked at the Apache Spark documentation and there doesn't seem a way to automatically join these two steps together. There must be a way of requesting JSON data in Java, storing it as an object and then converting it to a DataFrame, but I haven't been able to figure it out. Can anyone help?


You could store JSON data into a list of Strings like:

final String JSON_STR0 = "{\"name\":\"0\",\"address\":{\"city\":\"0\",\"region\":\"0\"}}";
final String JSON_STR1 = "{\"name\":\"1\",\"address\":{\"city\":\"1\",\"region\":\"1\"}}";
List<String> jsons = Arrays.asList(JSON_STR0, JSON_STR1);

where each String represents a JSON object.

Then you could transform the list to an RDD:

RDD<String> jsonRDD = sc.parallelize(jsons);

Once you've got RDD, it's easy to have DataFrame:

DataFrame data = sqlContext.read().json(jsonRDD);