Sharp Yan Sharp Yan - 1 year ago 169
R Question

Conflicting splits in CART decision tree

I'm currently using decision trees (CART) in R with packages rpart and rattle for classification.

After training my CART tree, I found that some rules conflict with each other. Consider the following tree, with the conflicting rules indicated by the red circle.

enter image description here

In the parent node the split is CHWC.VLV >= 15; if this is true you go left in the tree and if it is false you go right in the tree. To the left, we find that the child node's rule is CHWC.VLV < 15. However based on the splitting rule in the parent node, I wouldn't expect any of the observations in this part of the tree to have values CHWC.VLV < 15.

Does anybody know the cause of this apparent conflict?

Answer Source

This sort of issue generally comes from simply not outputting using enough digits of precision when outputting your CART tree. As a simple example, let's consider the following dataset:

CHWC.VLV <- seq(14, 16, length.out=10000)
outcome <- ifelse(CHWC.VLV >= 14.97, ifelse(CHWC.VLV <= 15.34, 1, 2), 3)

We can train and plot our CART model with:

mod <- rpart(outcome~CHWC.VLV)

enter image description here

This appears to be a contradiction, because the left subtree from the root node should have all values CHWC.VLV >= 15, but the next split is CHWC.VLV < 15. However, upon plotting with more digits of accuracy we see that this is, in fact, not a contradiction:

prp(mod, digits=4)

enter image description here

Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download