Uri Goren Uri Goren - 5 months ago 460
Java Question

Spark `FileAlreadyExistsException` when `saveAsTextFile` even though the output directory doesn't exist

I am running this command line:

hadoop fs -rm -r /tmp/output


And then a Java8 spark job with this
main()


SparkConf sparkConf = new SparkConf();
JavaSparkContext sc = new JavaSparkContext(sparkConf);
JavaRDD<JSONObject> rdd = sc.textFile("/tmp/input")
.map (s -> new JSONObject(s))
rdd.saveAsTextFile("/tmp/output");
sc.stop();


And I get this error:

ERROR ApplicationMaster: User class threw exception: org.apache.hadoop.mapred.FileAlreadyExistsException: Output directory /tmp/output already exists


Any idea how to fix it ?

Answer

You remove HDFS directory but Spark try to save in local file system.

To save in hdfs try this:

rdd.saveAsTextFile("hdfs://<URL-hdfs>:<PORT-hdfs>/tmp/output");

defaults for localhost is:

rdd.saveAsTextFile("hdfs://localhost:9000/tmp/output");

Other solution is remove /tmp/output from your local file system

Best regards

Comments