anthurium anthurium - 2 months ago 6
Scala Question

scala creating key value pairs from textfile with multiple entries for values

How to create key value pairs in the following format?

Sample Input in a

textfile
:


X: a b c

Y: f g


I want the output to be key value pairs and stored in an
RDD


(X,a)
(X,b)
(X,c)
(Y,f)
(Y,g)


EDIT:

val sprk = new SparkContent(conf)
in = sprk.textFile("sample_input.txt")
val tuples = in.maps{s =>
val parts = s.split("\\s+")
(parts(0), parts(1))
}.distinct

Answer

First split using : and then using \\s+

val textFile = sc.textFile("hdfs://...")

textFile.flatMap { line => {
      val Array(label, rest) = line split ":"
      val items = rest.trim.split("\\s+")
      items.map(item => (label.trim -> item))
    }}