Karan Kaushal Karan Kaushal - 1 month ago 6
JSON Question

how to parse Json objects which are nested in spark

i have a json file with the following schema:

root
|-- demo: boolean (nullable = true)
|-- person: struct (nullable = true)
| |-- dateOfBirth: string (nullable = true)
| |-- email: array (nullable = true)
| | |-- element: string (containsNull = true)
| |-- emergencyContacts: array (nullable = true)
| | |-- element: struct (containsNull = true)
| | | |-- name: string (nullable = true)
| | | |-- phone: string (nullable = true)
| | | |-- relationship: string (nullable = true)
| |-- id: long (nullable = true)
| |-- name: string (nullable = true)
| |-- phones: struct (nullable = true)
| | |-- home: string (nullable = true)
| | |-- mobile: string (nullable = true)
| |-- registered: boolean (nullable = true)
|-- product: string (nullable = true)
|-- releaseDate: string (nullable = true)


i want to parse the emergencyContacts array so as to get the names of the contacts

i have reached till the persons struct using:

val df =sqlContext.read.json("file:///home/training211/test/cjson1.json").toDF();
df.registerTempTable("df");
df.printSchema();
val person = df.select("person");
person.registerTempTable("person");
person.printSchema();
person.show();


if i want to go further it always gives an error as :
org.apache.spark.sql.AnalysisException: cannot resolve 'persons.emergencyContact s' given input columns: [person];

also tried doing:

val arrayFlatten = df.select($"person.emergencyContacts".getItem(0))


which gives me

+---------------------------+
|person.emergencyContacts[0]|
+---------------------------+
| [Jane Doe,888-555...|
+---------------------------+


but this is not the result i want

Any help is appreciated

Answer

Can you try the below.

df.select($"person.emergencyContacts").show

If you want to get the phone, you can do something like this.

df.select($"person.emergencyContacts.phone").show

Or you can iterate the emegencyContacts array to get the phone and name details. Look for Scala array iteration.