Jake Jake - 26 days ago 7
Scala Question

scala/spark:Read in RDD[(String,Int)]

I have the following text file (previously output from an RDD[(String,Int)] )

(ARCHITECTURE,50)
(BUSINESS,17)
(CHEMICAL ENGINEERING,6)
(CHILD DEVELOPMENT,43)
(CIVIL ENGINEERING,26)
etc


I can read in as RDD[String] like this:

spark.sparkContext.textFile(path + s"$path\\${fileName}_labelNames")


But how can I read in as RDD[String,Int]? Is it possible?

EDITED:
Fixed error in RDD type above

Answer Source

There is no RDD[String, Int], it's illegal.

Maybe what you mean is RDD[(String, Int)].

Here is how you can transform it from the original data.

val data = original.map { record =>
      val a = record.stripPrefix("(").stripSuffix(")").split(",")
      val k = a(0)
      val v = a(1).toInt
      (k, v)
    }

Where original variable is of type RDD[String], as you read from the source.