smeeb smeeb - 5 months ago 71
JSON Question

How to read in-memory JSON string into Spark DataFrame

I'm trying to read an in-memory JSON string into a Spark DataFrame on the fly:

var someJSON : String = getJSONSomehow()
val someDF : DataFrame = magic.convert(someJSON)

I've spent quite a bit of time looking at the Spark API, and the best I can find is to use a
like so:

var someJSON : String = getJSONSomehow()
val tmpFile : Output = Resource
val someDF : DataFrame =

But this feels kind of awkward/wonky and imposes the following constraints:

  1. It requires me to format my JSON to one object per line (per documentation); and

  2. It forces me to write the JSON to a temp file, which is slow and awkward; and

  3. It forces me to clean up temp files over time, which is cumbersome and feels "wrong" to me

So I ask: Is there a direct and more efficient way to convert a JSON string into a Spark DataFrame?


From Spark SQL guide:

val otherPeopleRDD = spark.sparkContext.makeRDD(
"""{"name":"Yin","address":{"city":"Columbus","state":"Ohio"}}""" :: Nil)
val otherPeople =

This creates a DataFrame from an intermediate RDD (created by passing a String).