So, I'm trying to do text multiclass classification. I have been reading a lot of old questions and blog posts, but I still can't fully understand the concept of that.
I tried some example from this blog post as well. http://www.laurentluce.com/posts/twitter-sentiment-analysis-using-python-and-nltk/
But when it comes to multiclass classification I don't quite understand that. Let's say I want to classify text into multi languages, French, English, Italian and German. And I want to use NaviesBayes which I think it would be the easiest to start with. From what I have read in the old questions, the simplest solution would be to use one vs all. So, each language will have its own model. So, I would have 3 models for French, English and Italian. Then I would run a text against every model and check if which one has the highest probability. Am I correct?
But when it comes to coding, in the example above he has tweets like this which will be classified either positive or negative.
pos_tweets = [('I love this car', 'positive'),
('This view is amazing', 'positive'),
('I feel great this morning', 'positive'),
('I am so excited about tonight\'s concert', 'positive'),
('He is my best friend', 'positive')]
neg_tweets = [('I do not like this car', 'negative'),
('This view is horrible', 'negative'),
('I feel tired this morning', 'negative'),
('I am not looking forward to tonight\'s concert', 'negative'),
('He is my enemy', 'negative')]
[('Bon jour', 'French'),
'je m'appelle', 'French']
('My name', 'English')]
There's no need for a one-vs-all scheme with Naive Bayes -- it's a multiclass model out of the box. Just feed a list of
(sample, label) pairs to the classifier learner where
label denotes the language.