user1122 user1122 - 2 months ago 41
Scala Question

pass RDD as parameter and return dataframe to a function - scala

I am trying to create function which takes string or RDD as an argument but returns dataframe.

Code:

def udf1 (input: String) = {
val file = sc.textFile(input);
file.map(p => Person(
(p.substring(1, 15)),
p.substring(16, 20))).toDF()
}

def main() {
case class Person(id: String, name: String)
val df1 = udf1 ("hdfs:\\")
}


but it retuns always rdd. any suggestions?

Answer

Not sure exactly why your code isn't working, but good Scala form would include specifying return types:

scala> case class Person(id: Int)
defined class Person

scala> def udf1(fName: String): DataFrame = {
     | val file = sc.textFile(fName)
     | file.map(p => Person(p.toInt)).toDF()
     | }
udf1: (fName: String)org.apache.spark.sql.DataFrame

scala> val df = udf1("file.txt")
df: org.apache.spark.sql.DataFrame = [id: int]
Comments