Amandeep Rathee - 6 months ago 39

R Question

I have a data frame where the variable to be predicted has 28 possible factor outcomes.

Now I run three classifier algorithms on the training data set which are support vector machine(SVM), random forest(RF) and k-nearest neighbor(kNN).

Now I have three prediction vectors corresponding to the three algorithms mentioned above. All of these have a good accuracy of about 80-90%.

I want to ensemble them and predict the final outcome variable based on voting system of the three algorithms.

Note: SVM has highest accuracy followed by RF and then kNN.

For example:

`SVM prediction | RF prediction | KNN prediction|Final outcome`

---------------|---------------|---------------|-------------

A |A |C |A

---------------|---------------|---------------|-------------

D |J |D |D

---------------|---------------|---------------|-------------

C |C |C |C

---------------|---------------|---------------|-------------

I |F |K |I (pick SVM's outcome in case of a tie)

As you can see what I want is very simple. How can I perform this in R programming ? And is there any other way of performing ensemble modelling in this situation ?

Answer

There is a statistical term for Popular voting : mode

```
SVMprediction <- c('A','D','C','I')
RFprediction <- c('A','J', 'C','F')
KNNprediction <- c('C','D', 'C','K')
data <- data.frame(SVMprediction, RFprediction , KNNprediction)
### Create the function.
getmode <- function(v) {
uniqv <- unique(v)
uniqv[which.max(tabulate(match(v, uniqv)))]
}
apply(data,1,getmode)
```

[1] "A" "D" "C" "I"

So, I can use it for n number of ensembling of predictors

Does it help?

Source (Stackoverflow)