I am new to spark/scala and need to load a file from hdfs to spark. I have a file in hdfs (
hdfs dfs -cat /newhdfs/abc.txt
spark-shell #It entered into scala console window
scala> import org.apache.spark._; //Line 1
scala> val conf=new SparkConf().setMaster("local[*]");
scala> val sc = new SparkContext(conf);
scala> val input=sc.textFile("hdfs:///newhdfs/abc.txt"); //Line 4
input: org.apache.spark.rdd.RDD[String] = hdfs:///newhdfs/abc.txt MapPartitionsRDD at textFile at <console>:27``
This is not an error, it just says the name of the file for your RDD.
In the Basic docs, there is this example:
scala> val textFile = sc.textFile("README.md") textFile: org.apache.spark.rdd.RDD[String] = README.md MapPartitionsRDD at textFile at <console>:25
which demonstrates the very same behavior.
How would you expect an error to happen without an action triggering actual work to happen?
If you want to check that everything is OK, do a count of your
input RDD, which is an action and will trigger the actual read of the file, and then the count of the elements of your RDD.