Mr.spark Mr.spark - 27 days ago 16
Java Question

Spark Sql, unable to query

I have the data schema of LinkeIn account as shown below. I need to query the skills which is in the for of array, where array may contains either JAVA OR java OR Java or JAVA developer OR Java developer.

Dataset<Row> sqlDF = spark.sql("SELECT * FROM people"
+ " WHERE ARRAY_CONTAINS(skills,'Java') "
+ " OR ARRAY_CONTAINS(skills,'JAVA')"
+ " OR ARRAY_CONTAINS(skills,'Java developer') "
+ "AND ARRAY_CONTAINS(experience['description'],'Java developer')" );

Answer
df.printschema()

root
 |-- skills: array (nullable = true)
 |    |-- element: string (containsNull = true)


df.show()

+--------------------+
|              skills|
+--------------------+
|        [Java, java]|
|[Java Developer, ...|
|               [dev]|
+--------------------+
Comments