Mahadevan Mahadevan - 1 month ago 18
Scala Question

cannot resolve symbol split in spark job

I'm running a spark application on my IntelliJ IDE as a Maven Project,
I'm trying to create a rowRDD and convert them to dataframe and store it in hdfs.


SPARK VERSION : 1.5.2
SCALA VERSION: 2.10.4


My code:

val rowRDD= dataframename.map(_.split("\t")).map(p => Row(p(0),p(1),p(2),p(3)))


It reports value split is not a member of my class package AND reports application does not take any parameters.

There is some dependency issue and I need help on that.

Note: I'm done with schema defenition for rowRDD

Thanks for supporting

Answer

From the Spark DataFrame Documentation:

map[R](f: (Row) ⇒ R)(implicit arg0: ClassTag[R]): RDD[R]
"Returns a new RDD by applying a function to all rows of this DataFrame."

So when you call map on a DF, you are mapping over Row objects which do not have a split method.

See DataFrame, Row documentation.

Comments