Ravi Ranjan Ravi Ranjan - 6 months ago 46
Scala Question

splitting string and adding the output string and creating a hashmap out of it in scala

I have a csv file with two columns like this:

column1 column2
sachin@@@tendulkar@@@Ganguly cricket@@@India@@@players

I want to convert it to a hash map which would be like this:

sachin-> "cricket, India, players"
tendulkar-> "cricket, India, players"
Ganguly-> "cricket, India, players"

cricket, India, players this should be a one string. How can I get it done in scala?
This is what I have done so far

val csv = sc.textFile("/tag/players.csv")
val headerAndRows = csv.map(line => line.split(",").map(_.trim))
val header = headerAndRows.first()
val synonyms = csv.map(_.split(",")).map( p=>(p(1)) // for column1
val targettag = csv.map(_.split(",")).map(p=>p(2)) // for column2
val splitsyno = synonyms.map(x => x.split("@@@"))
val splittarget = targettag.map(x=>x.split("@@@"))

I want to know how to proceed forward to create the desired hashmap?


That code works for a single line. After that you can merge all lines if you want to. I've hardcoded the provided row.

First it splits the data into a tuple. Step2 is replacing the '@@@' of column2 with ','. Step3 is splitting 'column1' at '@@@' and map it to a tuple as element of a Map and then convert it to a map.

You can quite optimize the solution.

val data = "sachin@@@tendulkar@@@Ganguly, cricket@@@India@@@players"

val (c1:String, c2:String) = data.split(",") match {
  case Array(a, b) => (a,b)
val c2s = c2.replace("@@@", ",")
val xx = c1.split("@@@").map(_ -> c2s).toMap

// Just to validate the ouput
xx.foreach(f => println(f._1 + "->" + f._2))