Mina Kolta - 23 days ago 8

PHP Question

I've implemented the Naive-Bayes Document classification with good text filtration and i have accepted statistical results with a good accuracy , i need to enhance my results using an EM algorithm .

**But i don't know if i may apply the EM algorithm with the Naive-Bayes results or apply the algorithm on the data and start all over hence i can compare results**

In both cases i need to **understand** the EM algorithm on this issue cause it's really confusing me

Any well-explained documents will be appreciated

Answer Source

EM generally helps you with unlabeled data. If you have some unlabeled data, you basically use it in a cycle like this

```
estimate some initial parameters, perhaps even randomly
while not converged:
relabel data using estimates
update estimates using new labels
```

If you are doing supervised learning, the relabel step is blowing away your labels, and is likely to make your classification worse.

On the other hand, this is a nice, detailed tutorial on semi-supervised naive bayes for text classification. If you have some small set of labelled documents and a large set of unlabeled documents, you can use them to estimate the initial parameters, and then do the iterative steps on unlabeled data, and end up with a better classifier.