rahul rahul - 13 days ago 8
Java Question

How to parse a multiline json in dataset apache spark java

Is there any way to parse a multi-line json file using Dataset
here is sample code

public static void main(String[] args) {

// creating spark session
SparkSession spark = SparkSession.builder().appName("Java Spark SQL basic example")
.config("spark.some.config.option", "some-value").getOrCreate();

Dataset<Row> df = spark.read().json("D:/sparktestio/input.json");
df.show();
}


it works perfectly if json is in a single line,but i need it for multi line

My json file

{
"name": "superman",
"age": "unknown",
"height": "6.2",
"weight": "flexible"
}

Answer
    SparkSession spark = SparkSession.builder().appName("Java Spark Hive Example")
            .config("spark.sql.warehouse.dir", warehouseLocation).enableHiveSupport().getOrCreate();

    JavaRDD<Tuple2<String, String>> javaRDD = spark.sparkContext().wholeTextFiles(filePath, 1).toJavaRDD();

    List<Tuple2<String, String>> collect = javaRDD.collect();
    System.out.println("everything =  " + everything);
Comments