ScazzoMatto ScazzoMatto - 24 days ago 6
Scala Question

Word count using Spark and Scala

i have to write a program in Scala, using spark which counts how many times a word occours in a text, but using the RDD my variable count always displays 0 at the end. Can you help me please?
This is my code

import scala.io.Source
import org.apache.spark.SparkContext
import org.apache.spark.SparkContext._
import org.apache.spark.SparkConf

object wordcount {
def main(args: Array[String]) {
// set spark context
val conf = new SparkConf().setAppName("wordcount").setMaster("local[*]")
val sc = new SparkContext(conf)

val distFile = sc.textFile("bible.txt")

print("Enter word to loook for in the HOLY BILE: ")
val word = Console.readLine
var count = 0;
println("You entered " + word)

for (bib <- distFile.flatMap(_.split(" "))) {

if (word==bib) {
count += 1

}

}
println(word + " occours " + count + " times in the HOLY BIBLE!")
}
}

Answer

I suggest you to use available transformations in RDD instead of your own program (though its not harm) to get the desired result, for example following code could be used to retrieve the word count.

val word = Console.readLine
println("You entered " + word)
val input = sc.textFile("bible.txt")
val splitedLines = input.flatMap(line => line.split(" "))
                    .filter(x => x.equals(word))

System.out.println(splitedLines.count())

Please refer to this link for more information about the internals of Spark.