 L.kyunam -4 years ago 234
R Question

# what is mean y in decision tree using R

i use iris data in R.

i wrote code like this:

``````irisctree<-ctree(Species ~ Sepal.Length + Sepal.Width + Petal.Length + Petal.Width)
plot(itisctree,type="simple")
``````

and, result see me like this: what is mean this?

y=(1,0,0) and y=(0,0.939,0.061), y=(0,0.031,0.969) jav
Answer Source

If you look at Species (your response variable) in the `iris` dataset, you will see that it is a factor with 3 levels:

``````> unique(iris\$Species)
 setosa     versicolor virginica
Levels: setosa versicolor virginica
``````

Given that the levels occur in the above order: setosa, versicolor, virginica, the output of the decision tree is the probability of each of these levels, and that probability sums to 1.

To verify this, look at the left split of your tree. It splits at `Petal.Length <= 1.9`. What is the distribution of the Species when `Petal.Length <= 1.9`?

``````prop.table(table(iris[iris\$Petal.Length <= 1.9,]\$Species))

setosa versicolor  virginica
1          0          0
``````

In the above code, I subset on `Petal.Length <= 1.9`, then look at the distribution of the Species (hence `prop.table(table(...))`). 100% is Setosa.

Another example: Right split (`Petal.Length > 1.9`) and left split (`Petal.Width <= 1.6`). The result is:

``````prop.table(table(iris[iris\$Petal.Length > 1.9 & iris\$Petal.Width <= 1.6,]\$Species))

setosa versicolor  virginica
0.00000000 0.92307692 0.07692308
``````

My numbers here do not match with yours. I believe you have a training set of 100 rows, whereas I am using the entire dataset. This may be the reason for the discrepancy. Correct me if I am wrong.

Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download
Latest added