animal animal - 1 year ago 154
Scala Question

Source.fromFile not working for HDFS file path

i am trying to read file contents from my hdfs for that i am using Source.fromFile(). It is working fine when my file is in local system but throwing error when i am trying to read file from HDFS.

object CheckFile{
def main(args:Array[String]) {
for (line <- Source.fromFile("/user/cloudera/xxxx/File").getLines()) {

Error: hdfs:/quickstart.cloudera:8080/user/cloudera/xxxx/File (No such file or directory)

i searched but i am not able to find any solutions to this.

Please help

Answer Source

If you are using Spark you should use SparkContext to load the files. Source.fromFile uses the local file system.

Say you have your SparkContext at sc,

val fromFile = sc.textFile("hdfs://path/to/file.txt")

Should do the trick. You might have to specify the node address, though.


To add to the comment. You want to read some data from hdfs and store it as a Scala collection. This is bad practice as the file might contain milions of lines and it will crash due to insufficient amount of memory; you should use RDDs and not built-in Scala collections. Nevertheless, if this is what you want, you could do:

val fromFile = sc.textFile("hdfs://path/to/file.txt").toLocalIterator.toArray

Which would produce a local collection of desired type (Array in this case).