duckertito duckertito - 8 days ago 5
R Question

Creation of partial response plots: cannot determine names of features

I want to create partial response plots as shown here. I trained my

randomForest
model as follows (totally, 12 features and 1 class variable):

fit <- randomForest(as.factor(Y) ~ TIME_1 + TIME_2 + TIME_3 + DURATION_1 + DURATION_2 + DURATION_3 +
VALUE_1 + VALUE_2 + VALUE_3 +
Weekday_1 + Weekday_2 + Weekday_3,
data=train,
importance=TRUE,
ntree=50)


Then I run this code to get plots, but it looks like variable names cannot be detected. In particular, for some reason
importanceOrder
returns values like
102
, while I only have 12 features.

importanceOrder=order(-fit$importance)
importanceOrder
[1] 102 108 101 107 111 129 117 109 100 132 106 110 105 118 122 127 104 130 123 125 103 124 121
[24] 116 115 119 120 126 131 128 112 113 114 36 42 45 35 41 38 63 69 66 34 68 44 75
[47] 74 64 61 58 96 43 99 78 30 2 33 67 37 8 49 1 40 71 3 76 50 73 7
[70] 10 91 51 94 9 97 70 77 25 83 27 28 53 4 82 39 31 59 17 84 93 19 18
[93] 5 92 26 16 85 86 54 11 72 29 20 95 55 56 87 88 22 24 90 89 21 23 48
[116] 46 57 79 81 32 13 6 15 14 98 80 12 65 47 62 52 60

names=rownames(fit$importance)[importanceOrder][1:15]
names
[1] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA

par(mfrow=c(5, 3), xpd=NA)
for (name in names)
+ partialPlot(fit, train, eval(name), main=name, xlab=name,ylim=c(-.2,.9))

Error in `[.data.frame`(pred.data, , xname) : undefined columns selected

Answer

I think if you look at the structure of fit$importance it'll be a bit clearer why it's not working.

You just want to order based on the last column of fit$importance, not the entire array.

library(randomForest)
fit <- randomForest(Species ~ Sepal.Length + Sepal.Width + 
                      Petal.Length + Petal.Width, 
                    data=iris, importance=T, ntree=50)

fit$importance

# setosa versicolor  virginica MeanDecreaseAccuracy MeanDecreaseGini
# Sepal.Length 0.05176523 0.03398421 0.05009963           0.04412921        12.464634
# Sepal.Width  0.01846554 0.01564288 0.01006655           0.01486503         3.512521
# Petal.Length 0.23199887 0.23484289 0.33840220           0.27046565        38.386311
# Petal.Width  0.41265955 0.30366844 0.26475770           0.32568744        44.906934

importanceOrder<-order(-fit$importance[,'MeanDecreaseGini'])

names<-rownames(fit$importance)[importanceOrder][1:4]

par(mfrow=c(2, 2), xpd=NA)
for (name in names) partialPlot(fit, iris, eval(name), main=name, xlab=name)