I've implemented the Naive-Bayes Document classification with good text filtration and i have accepted statistical results with a good accuracy , i need to enhance my results using an EM algorithm .
But i don't know if i may apply the EM algorithm with the Naive-Bayes results or apply the algorithm on the data and start all over hence i can compare results
In both cases i need to understand the EM algorithm on this issue cause it's really confusing me
Any well-explained documents will be appreciated
EM generally helps you with unlabeled data. If you have some unlabeled data, you basically use it in a cycle like this
estimate some initial parameters, perhaps even randomly while not converged: relabel data using estimates update estimates using new labels
If you are doing supervised learning, the relabel step is blowing away your labels, and is likely to make your classification worse.
On the other hand, this is a nice, detailed tutorial on semi-supervised naive bayes for text classification. If you have some small set of labelled documents and a large set of unlabeled documents, you can use them to estimate the initial parameters, and then do the iterative steps on unlabeled data, and end up with a better classifier.