Jose Jose - 8 months ago 54
Java Question

scala.collection.Seq doesn't work on Java


  • Apache Spark 2.0.1

  • Java 7

On the Apache Spark Java API documentation for the class DataSet appears an example to use the method join using a scala.collection.Seq parameter to specify the columns names. But I'm not able to use it.
On the documentation they provide the following example:

df1.join(df2, Seq("user_id", "user_name"))

Error: Can not find Symbol Method Seq(String)

My Code:

import org.apache.spark.sql.Dataset;
import org.apache.spark.sql.Row;
import scala.collection.Seq;

public class UserProfiles {

public static void calcTopShopLookup() {
Dataset<Row> udp = Spark.getDataFrameFromMySQL("my_schema","table_1");

Dataset<Row> result = Spark.getSparkSession().table("table_2").join(udp,Seq("col_1","col_2"));


Seq(x, y, ...) is a Scala way to create sequence. Seq has it's companion object, which has apply method, which allows to not write new each time.

It should be possible to write:

import scala.collection.JavaConversions;
import scala.collection.Seq;

import static java.util.Arrays.asList;

Dataset<Row> result = Spark.getSparkSession().table("table_2").join(udp, JavaConversions.asScalaBuffer(asList("col_1","col_2")));`

Or you can create own small method:

 public static <T> Seq<T> asSeq(T... values) {
        return JavaConversions.asScalaBuffer(asList(values));