Siva Siva - 8 months ago 193
Java Question

Twitter sentiment analysis using Naive Bayes in apache spark

I am trying to do a basic twitter sentiment analysis, by using apache spark.

The below page explains on Naive Bayes function used at apache spark which would be a candidate for the above problem.

when you check at the java example,
the training and test set are given as

JavaRDD<LabeledPoint> training = ... // training set
JavaRDD<LabeledPoint> test = ... // test set

I dont have any clue what datatype they are, but i can understand that they are some non english inputs.

I have a list of tweets say.

"I love my country."

"Great day at office."

"Google Chrome sucks!"

How do i use the naive bayes function to process the text ?

any insights on this would be helpful.


LabeledPoint is of the format (double, Vectors(double[])) where first parameter is label and second is a Vector of features (only non-negative real values). But for your case it does not match. Which means you have to find a way to convert your data to real values. TFIDF seems to be one way. You might be interested to read this example for better understanding.