Joe Joe - 12 days ago 4
Scala Question

Can I write a plain text HDFS (or local) file from a Spark program, not from an RDD?

I have a Spark program (in Scala) and a

SparkContext
. I am writing some files with
RDD
's
saveAsTextFile
. On my local machine I can use a local file path and it works with the local file system. On my cluster it works with HDFS.

I also want to write other arbitrary files as the result of processing. I'm writing them as regular files on my local machine, but want them to go into HDFS on the cluster.

SparkContext
seems to have a few file-related methods but they all seem to be inputs not outputs.

How do I do this?

Joe Joe
Answer

Thanks to marios and kostya, but there are few steps to writing a text file into HDFS from Spark.

// Hadoop Config is accessible from SparkContext
val fs = FileSystem.get(sparkContext.hadoopConfiguration); 

// Output file can be created from file system.
val output = fs.create(new Path(filename));

// But BufferedOutputStream must be used to output an actual text file.
val os = BufferedOutputStream(output)

os.write("Hello World".getBytes("UTF-8"))

os.close()

Note that FSDataOutputStream, which has been suggested, is a Java serialized object output stream, not a text output stream. The writeUTF method appears to write plaint text, but it's actually a binary serialization format that includes extra bytes.

Comments