Stanislav Barabanov Stanislav Barabanov -4 years ago 134
Python Question

What algorithm to chose for binary image classification

Lets say I have two arrays in dataset:

1) The first one is array classified as (0,1) - [0,1,0,1,1,1,0.....]

2) And the second array costists of grey scale image vectors with 2500 elements in each(numbers from 0 to 300). These numbers are pixels from 50*50px images. - [[13 160 239 192 219 199 4 60..][....][....][....][....]]

The size of this dataset is quite significant (~12000 elements).
I am trying to build bery basic binary classificator which will give appropriate results. Lets say I wanna choose non deep learning but some supervised method.
Is it suitable in this case? I've already tried SVM of sklearn with various parameters. But the outcome is inappropriately inacurate and consists mainly of 1: [1,1,1,1,1,0,1,1,1,....]

What is the right approach? Isnt a size of dataset enough to get a nice result with supervised algorithm?

Answer Source

You should probably post this on cross-validated: But as a direct answer you should probably look into sequence to sequence learners as it has been clear to you SVM is not the ideal solution for this.

You should look into Markov models for sequential learning if you dont wanna go the deep learning route, however, Neural Networks have a very good track record with image classification problems.

Ideally for a Sequential learning you should try to look into Long Short Term Memory Recurrent Neural Networks, and for your current dataset see if pre-training it on an existing data corpus (Say CIFAR-10) may help.

So my recomendation is give Tensorflow a try with a high level library such as Keras/SKFlow.
Neural Networks are just another tool in your machine learning repertoire and you might aswell give them a real chance.

An Edit to address your comment:
Your issue there is not a lack of data for SVM,
the SVM will work well, for a small dataset, as it will be easier for it to overfit/fit a separating hyperplane on this dataset.
As you increase your data dimensionality, keep in mind that separating it using a separating hyperplane becomes increasingly difficult[look at the curse of dimensionality].
However if you are set on doing it this way, try some dimensionality reduction such as PCA.

Although here you're bound to find another fence-off with Neural Networks, since the Kohonen Self Organizing Maps do this task beautifully, you could attempt to project your data in a lower dimension therefore allowing the SVM to separate it with greater accuracy.
I still have to stand by saying you may be using the incorrect approach.

Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download