Panchito Panchito - 3 months ago 17
R Question

In the as.party function how can I clarify which are the indices for the different nodes?

After creating my CART with rpart I proceed to convert it to a party object with the as.party function from the partykit package. The subsecuent error appears:


as.party(tree.hunterpb1)


Error in partysplit(varid = which(rownames(obj$split)[j] == names(mf)), :
‘index’ has less than two elements


I can only assume thet it's refering to the partitioning made by factor variables as I´ve understood from the literature, since the index applies to factors. My tree looks like this:


tree.hunterpb1
n= 354


node), split, n, deviance, yval
* denotes terminal node

1) root 354 244402.100 75.45134
2) hr.11a14>=49.2125 19 3378.322 33.44274 *
3) hr.11a14< 49.2125 335 205592.400 77.83391
6) month=April,February,June,March,May 141 58656.390 68.57493 *
7) month=August,December,January,July,November,October,September 194 126062.800 84.56338
14) presion.11a14>=800.925 91 74199.080 81.32755
28) month=January,November,October 16 9747.934 63.13394 *
29) month=August,December,July,September 75 58025.190 85.20885 *
15) presion.11a14< 800.925 103 50069.100 87.42223 *


The traceback shows that the first partition´s conversion to party class is done correctly but the second one based on the factor variables fails and produced said error.

Previously when working on similar data this error has not appeared. I can only assume that the as.party function isn't finding the indeces. Any advice on how to solve this will be appreciated.

Answer

Possibly, the problem is caused by the following situation. (Thanks to Yan Tabachek for e-mailing me a similar example.) If one of the partitioning variables passed on to rpart() is a character variable, then it is processed as if it were a factor by rpart() but not by the conversion in as.party(). As a simple example consider this small data set:

d <- data.frame(y = c(1:10, 101:110))
d$x <- rep(c("a", "b"), each = 10)

Fitting the rpart() tree treats the character variable x as a factor:

library("rpart")
(rp <- rpart(y ~ x, data = d))

## n= 20 
## 
## node), split, n, deviance, yval
##       * denotes terminal node
## 
## 1) root 20 50165.0  55.5  
##   2) x=a 10    82.5   5.5 *
##   3) x=b 10    82.5 105.5 *

However, the as.party() conversion does not work:

library("partykit")
as.party(rp)

## Error in partysplit(varid = which(rownames(obj$split)[j] == names(mf)),  : 
##   'index' has less than two elements

The best fix is to transform x to a factor variable and re-fit the tree. Then the conversion also works smoothly:

d$x <- factor(d$x)
rp <- rpart(y ~ x, data = d)
as.party(rp)

## Model formula:
## y ~ x
## 
## Fitted party:
## [1] root
## |   [2] x in a: 5.500 (n = 10, err = 82.5)
## |   [3] x in b: 105.500 (n = 10, err = 82.5)
## 
## Number of inner nodes:    1
## Number of terminal nodes: 2

I also added a fix in the development version of partykit on R-Forge to avoid the problem in the first place. It will be included in the next CRAN release (probably 1.0-1 for which a release date has not yet been scheduled).

Comments