L.kyunam L.kyunam - 3 months ago 12
R Question

what is mean y in decision tree using R

i use iris data in R.

i wrote code like this:

irisctree<-ctree(Species ~ Sepal.Length + Sepal.Width + Petal.Length + Petal.Width)
plot(itisctree,type="simple")


and, result see me like this:
enter image description here

what is mean this?

y=(1,0,0) and y=(0,0.939,0.061), y=(0,0.031,0.969)

jav jav
Answer

If you look at Species (your response variable) in the iris dataset, you will see that it is a factor with 3 levels:

> unique(iris$Species)
[1] setosa     versicolor virginica 
Levels: setosa versicolor virginica

Given that the levels occur in the above order: setosa, versicolor, virginica, the output of the decision tree is the probability of each of these levels, and that probability sums to 1.

To verify this, look at the left split of your tree. It splits at Petal.Length <= 1.9. What is the distribution of the Species when Petal.Length <= 1.9?

prop.table(table(iris[iris$Petal.Length <= 1.9,]$Species))

setosa versicolor  virginica 
     1          0          0

In the above code, I subset on Petal.Length <= 1.9, then look at the distribution of the Species (hence prop.table(table(...))). 100% is Setosa.

Another example: Right split (Petal.Length > 1.9) and left split (Petal.Width <= 1.6). The result is:

prop.table(table(iris[iris$Petal.Length > 1.9 & iris$Petal.Width <= 1.6,]$Species))

setosa versicolor  virginica 
0.00000000 0.92307692 0.07692308 

My numbers here do not match with yours. I believe you have a training set of 100 rows, whereas I am using the entire dataset. This may be the reason for the discrepancy. Correct me if I am wrong.

Comments