Jake Jake - 26 days ago 7
Scala Question

scala/spark:Read in RDD[(String,Int)]

I have the following text file (previously output from an RDD[(String,Int)] )


I can read in as RDD[String] like this:

spark.sparkContext.textFile(path + s"$path\\${fileName}_labelNames")

But how can I read in as RDD[String,Int]? Is it possible?

Fixed error in RDD type above

Answer Source

There is no RDD[String, Int], it's illegal.

Maybe what you mean is RDD[(String, Int)].

Here is how you can transform it from the original data.

val data = original.map { record =>
      val a = record.stripPrefix("(").stripSuffix(")").split(",")
      val k = a(0)
      val v = a(1).toInt
      (k, v)

Where original variable is of type RDD[String], as you read from the source.