user1122 user1122 - 4 months ago 74
Scala Question

pass RDD as parameter and return dataframe to a function - scala

I am trying to create function which takes string or RDD as an argument but returns dataframe.


def udf1 (input: String) = {
val file = sc.textFile(input); => Person(
(p.substring(1, 15)),
p.substring(16, 20))).toDF()

def main() {
case class Person(id: String, name: String)
val df1 = udf1 ("hdfs:\\")

but it retuns always rdd. any suggestions?


Not sure exactly why your code isn't working, but good Scala form would include specifying return types:

scala> case class Person(id: Int)
defined class Person

scala> def udf1(fName: String): DataFrame = {
     | val file = sc.textFile(fName)
     | => Person(p.toInt)).toDF()
     | }
udf1: (fName: String)org.apache.spark.sql.DataFrame

scala> val df = udf1("file.txt")
df: org.apache.spark.sql.DataFrame = [id: int]