I am trying to do a basic twitter sentiment analysis, by using apache spark.
The below page explains on Naive Bayes function used at apache spark which would be a candidate for the above problem.
when you check at the java example,
the training and test set are given as
JavaRDD<LabeledPoint> training = ... // training set
JavaRDD<LabeledPoint> test = ... // test set
LabeledPoint is of the format
(double, Vectors(double)) where first parameter is label and second is a Vector of features (only non-negative real values). But for your case it does not match. Which means you have to find a way to convert your data to real values. TFIDF seems to be one way. You might be interested to read this example for better understanding.