kruparulz14 kruparulz14 - 3 months ago 13
Scala Question

Writing to a file in Apache Spark

I am writing a Scala code that requires me to write to a file in HDFS.
When I use

Filewriter.write
on local, it works. The same thing does not work on HDFS.
Upon checking, I found that there are the following options to write in Apache Spark-
RDD.saveAsTextFile
and
DataFrame.write.format
.

My question is: what if I just want to write an int or string to a file in Apache Spark?

Follow up:
I need to write to an output file a header, DataFrame contents and then append some string.
Does
sc.parallelize(Seq(<String>))
help?

Answer

create RDD with your data (int/string) using Seq: see parallelized-collections for details:

sc.parallelize(Seq(5))  //for writing int (5)
sc.parallelize(Seq("Test String")) // for writing string

val conf = new SparkConf().setAppName("Writing Int to File").setMaster("local")
val sc = new SparkContext(conf) 
val intRdd= sc.parallelize(Seq(5))   
intRdd.saveAsTextFile("out\\int\\test")

val conf = new SparkConf().setAppName("Writing string to File").setMaster("local")
val sc = new SparkContext(conf)   
val stringRdd = sc.parallelize(Seq("Test String"))
stringRdd.saveAsTextFile("out\\string\\test")
Comments