Sonu Mishra - 1 year ago 68

R Question

I have a huge data

`(4M x 17)`

`> testMice <- mice(myData[1:100000,]) # runs fine`

> testTot <- predict(testMice, myData)

Error in UseMethod("predict") :

no applicable method for 'predict' applied to an object of class "mids"

Running the imputation on whole dataset was computationally expensive, so I ran it on only the first 100K observations. Then I am trying to use the output to impute the whole data.

Is there anything wrong with my approach? If yes, what should I do to make it correct? If no, then why am I getting this error?

Answer Source

Neither `mice`

nor `hmisc`

provide the parameter estimates from the imputation process. Both `Amelia`

and `imputeMulti`

do. In both cases, you can extract the parameter estimates and use them for imputing your other observations.

`Amelia`

assumes your data are distributed as a multivariate normal (eg. X \sim N(\mu, \Sigma).`imputeMulti`

assumes that your data is distributed as a multivariate multinomial distribution. That is the complete cell counts are distributed (X \sim M(n,\theta)) where n is the number of observations.

Fitting can be done as follows, via example data. Examining parameter estimates is shown further below.

```
library(Amelia)
library(imputeMulti)
data(tract2221, package= "imputeMulti")
test_dat2 <- tract2221[, c("gender", "marital_status","edu_attain", "emp_status")]
# fitting
IM_EM <- multinomial_impute(test_dat2, "EM",conj_prior = "non.informative", verbose= TRUE)
amelia_EM <- amelia(test_dat2, m= 1, noms= c("gender", "marital_status","edu_attain", "emp_status"))
```

- The parameter estimates from the
`amelia`

function are found in`amelia_EM$mu`

and`amelia_EM$theta`

. - The parameter estimates in
`imputeMulti`

are found in`IM_EM@mle_x_y`

and can be accessed via the`get_parameters`

method.

`imputeMulti`

has noticeably higher imputation accuracy for categorical data relative to either of the other 3 packages, though it only accepts multinomial (eg. `factor`

) data.

All of this information is in the **currently unpublished** vignette for `imputeMulti`

. The paper has been submitted to JSS and I am awaiting a response before adding the vignette to the package.