Béatrice Moissinac Béatrice Moissinac - 1 month ago 21
Javascript Question

Spark 2.0.0 - JSON malformed output

I am processing data with Spark and Scala, and saving it in

json


df2.write.mode("overwrite").json("mydata")


The output looks like this:

{"GPS_LAT":xx.xxxxx,"GPS_LONG":xx.xxxxx,"count":10063}
{"GPS_LAT":xx.xxxxx,"GPS_LONG":xx.xxxxx,"count":3142}
{"GPS_LAT":xx.xxxxx,"GPS_LONG":xx.xxxxx,"count":7766}


I use the data to create a visualization using
d3
, using
d3.json
:

d3.json("mydata.json", function(d){
console.log(d)
};


My problem is that
d3.js
expects
json
to be formatted as follow:

[{"GPS_LAT":xx.xxxxx,"GPS_LONG":xx.xxxxx,"count":10063},
{"GPS_LAT":xx.xxxxx,"GPS_LONG":xx.xxxxx,"count":3142},
{"GPS_LAT":xx.xxxxx,"GPS_LONG":xx.xxxxx,"count":7766}]


Who is wrong? Spark or
d3
? What can I do to alleviate this situation without having to manually add
[,]
?

Answer

I don't know Spark, but I can say that this is not a valid JSON, you have just a bunch of objects that are not wrapped in an array. So, for "who is wrong?", I'd say Spark.

But there is a (ugly) workaround. Use d3.text to load that thing (that bunch of objects):

d3.text("data.json", function(data){});

Then, your data will be a string. The next step is splitting the string by new lines:

data = data.match(/[^\r\n]+/g);

And, then, we transform this in an array of objects:

data = data.map(function(d){
    return JSON.parse(d)
});

All together:

d3.text("data.json", function(data){

    data = data.match(/[^\r\n]+/g);
    data = data.map(function(d){
        return JSON.parse(d)
    });
    //now you can use 'data' here

});

Check the console in this plunker: https://plnkr.co/edit/ER1oXyWZL62dwxlgaenP?p=preview

And, now that you have an array of objects, you can pass it to your D3 code.

PS: This may not work if you have dates in the data.