yoru yoru - 2 months ago 8
Scala Question

Want to make output as an Object instead of println



I want to make an object instead of println.


text file would be like

"nagano,apple"

"nagano,pear"

"texas,grapefruit"

"rio,guava"

"rio,guava"

and result like

"(nagano,apple,1)"

"(nagano,pear,1)"

"(texas,grapefruit,1)"

"(rio,guava,2)"

def main(args: Array[String]) = {

val conf = new SparkConf()
.setAppName("WordCount")
.setMaster("local")
val sc = new SparkContext(conf)

// read text info
val textfile = sc.textFile("C:\\fruitbox.csv")
twitter.filter(_.nonEmpty)
val map = twitter.map { word => (word, 1) }
.reduceByKey(_ + _)
.foreach(println) // ← want to do something about this row
}


but I want to make println result to be like this

.foreach(
fruitbox.setCity(_.split(",")[0])
fruitbox.setApple(_.split(",")[1])
...
)


it seems like simple knowledge of syntax
but I couldn't figure this out.

Answer

You need to think of it in functional terms or you will get crazy. Replace the foreach with a map that takes a function of this form:

.map(myInputTuple=>MyCaseClass(myInputTuple._0,myInputTuple._1,myInputTuple._2))

Do you know case classes? If you don't, you should give it a look and define your own to hold the data you deal with. Otherwise, if you want to build a specific instance of an already existing class that has setters, you can do it this way:

.map(myInputTuple=>{
val myInstance = new myClass()
myInstance.setField1(myInputTuple._0)
...
myInstance
}
)

Mind the brackets: {} these define a scope where you can write non-functional code and the last value of this scope will be the value returned, in this case the instance of your class.

Also when you post stuff about Spark, try to make explicit the types you're dealing with at every step so it's easier to write code to help you.