rohanagarwal rohanagarwal - 10 months ago 1068
Java Question

WrappedArray of WrapedArray to java array

I have a column of type set and I use

of spark Dataset API which returns a wrapped array of wrapped array. I want a single array from all values of the nested wrapped arrays. How can I do that?

Eg. Cassandra table:


I'm using Spark Dataset API.

returns a wrapped array of wrapped array.

Answer Source

Consider you have Dataset<Row> ds which has value column.

|value                  |
|[WrappedArray(1, 2, 3)]|

And it has below schema

 |-- value: array (nullable = true)
 |    |-- element: array (containsNull = true)
 |    |    |-- element: integer (containsNull = false)

Using UDF

Define UDF1 like below.

static UDF1<WrappedArray<WrappedArray<Integer>>, List<Integer>> getValue = new UDF1<WrappedArray<WrappedArray<Integer>>, List<Integer>>() {
      public List<Integer> call(WrappedArray<WrappedArray<Integer>> data) throws Exception {
        List<Integer> intList = new ArrayList<Integer>();
        for(int i=0; i<data.size(); i++){
        return intList;

Register and call UDF1 like below

import static org.apache.spark.sql.functions.col;
import static org.apache.spark.sql.functions.callUDF;
import scala.collection.JavaConversions;

//register UDF
spark.udf().register("getValue", getValue, DataTypes.createArrayType(DataTypes.IntegerType));

//Call UDF
Dataset<Row> ds1  ="*"), callUDF("getValue", col("value")).as("udf-value"));;

Using explode function

import static org.apache.spark.sql.functions.col;
import static org.apache.spark.sql.functions.explode;

Dataset<Row> ds2 ="value")).as("explode-value"));;
Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download