Uri Goren Uri Goren -5 years ago 1418
Java Question

Spark `FileAlreadyExistsException` when `saveAsTextFile` even though the output directory doesn't exist

I am running this command line:

hadoop fs -rm -r /tmp/output

And then a Java8 spark job with this

SparkConf sparkConf = new SparkConf();
JavaSparkContext sc = new JavaSparkContext(sparkConf);
JavaRDD<JSONObject> rdd = sc.textFile("/tmp/input")
.map (s -> new JSONObject(s))

And I get this error:

ERROR ApplicationMaster: User class threw exception: org.apache.hadoop.mapred.FileAlreadyExistsException: Output directory /tmp/output already exists

Any idea how to fix it ?

Answer Source

You remove HDFS directory but Spark try to save in local file system.

To save in hdfs try this:


defaults for localhost is:


Other solution is remove /tmp/output from your local file system

Best regards

Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download