Sotos Sotos - 3 months ago 27
Scala Question

Get unique RDD strings

I have created the following sample RDD,

val rdd = sc.parallelize(List(("something1@domainA.com"),
("something2@domainA.com"),
("something3@domainB.com")))

//I used the following to split,

val rdd1 = rdd.map(_.split("@")) //RDD[Array[String]]


What I am trying to do now is to get a new RDD with distinct domains, i.e.

val finalrdd = sc.parallelize(List(("domainA"),
("domainB")))


I found this post but I couldn't get it to work.

Answer

Try:

rdd.map(_.split("@")).flatMap { case Array(_, d) => d.split("\\.").headOption }.distinct
Comments