Yan - 1 month ago 22

R Question

My data generated strange results with svm on R from the e1071 package, so I tried to check if the R svm can generate same result as WEKA (or python), since I've been using WEKA in the past.

I googled the question and found one that has the exact same confusion with me but without an answer. This is the question.

So I hope that I could get an answer here.

To make things easier, I'm also using the iris data set, and train a model (SMO in WEKA, and svm from R package e1071) using the whole iris data, and test on itself.

**WEKA parameters**:

`weka.classifiers.functions.SMO -C 1.0 -L 0.001 -P 1.0E-12 -N 0 -V 10 -W 1 -K "weka.classifiers.functions.supportVector.RBFKernel -G 0.01 -C 250007"`

Other than default, I changed kernel into RBFKernel to make it consistant with the R fucntion.

The result is:

`a b c <-- classified as`

50 0 0 | a = Iris-setosa

0 46 4 | b = Iris-versicolor

0 7 43 | c = Iris-virginica

`library(e1071)`

model <- svm(iris[,-5], iris[,5], kernel="radial", epsilon=1.0E-12)

res <- predict(model, iris[,-5])

table(pred = res, true = iris[,ncol(iris)])

The result is:

`true`

pred setosa versicolor virginica

setosa 50 0 0

versicolor 0 48 2

virginica 0 2 48

I'm not a machine learning person, so I'm guessing the default parameters are very different for these two methods. For example, e1071 has 0.01 as default

`epsilon`

Thanks.

Answer

Refer to http://weka.sourceforge.net/doc.dev/weka/classifiers/functions/SMO.html for the RWeka parameters for SMO and use ?svm to find the corresponding parameters for e1071 svm implementation.

As per ?svm, R e1071 svm is an interface to libsvm and seems to use standard QP solvers.

For multiclass-classification with k levels, k>2, libsvm uses the ‘one-against-one’-approach, in which k(k-1)/2 binary classifiers are trained; the appropriate class is found by a voting scheme. libsvm internally uses a sparse data representation, which is also high-level supported by the package SparseM.

To the contrary ?SMO in RWeka

implements John C. Platt's sequential minimal optimization algorithm for training a support vector classifier using polynomial or RBF kernels. Multi-class problems are solved using pairwise classification.

So, these two implementations are different in general (so the results may be a little different). Still if we choose the corresponding hyper-parameters same, the confusion matrix is almost the same:

```
library(RWeka)
model.smo <- SMO(Species ~ ., data = iris,
control = Weka_control(K = list("RBFKernel", G=2), C=1.0, L=0.001, P=1.0E-12, N=0, V=10, W=1234))
res.smo <- predict(model.smo, iris[,-5])
table(pred = res.smo, true = iris[,ncol(iris)])
true
pred setosa versicolor virginica
setosa 50 0 0
versicolor 0 47 1
virginica 0 3 49
library(e1071)
set.seed(1234)
model.svm <- svm(iris[,-5], iris[,5], kernel="radial", cost=1.0, tolerance=0.001, epsilon=1.0E-12, scale=TRUE, cross=10)
res.svm <- predict(model.svm, iris[,-5])
table(pred = res.svm, true = iris[,ncol(iris)])
true
pred setosa versicolor virginica
setosa 50 0 0
versicolor 0 49 1
virginica 0 1 49
```

Also refer to this: [http://stats.stackexchange.com/questions/130293/svm-and-smo-main-differences][1] and this [https://www.quora.com/Whats-the-difference-between-LibSVM-and-LibLinear][1]