majom - 9 months ago 78

R Question

There are multiple packages for

`R`

Some of these packages are

`apsrtable`

`xtable`

`memisc`

`texreg`

`outreg`

`stargazer`

`R`

`h2o`

Here is an example of two simple GLM models with

`h2o`

`# Load package and setup h2o`

library(h2o)

localH2O <- h2o.init(ip = 'localhost', port = 54321, max_mem_size = '4g')

# Load data

prostatePath <- system.file("extdata", "prostate.csv", package = "h2o")

prostate.hex <- h2o.importFile(path = prostatePath, destination_frame = "prostate.hex")

# Run GLMs

model.output.1 <- h2o.glm(y = "CAPSULE", x = c("RACE","PSA","DCAPS"),

training_frame = prostate.hex,family = "binomial", nfolds = 0,

alpha = 0.5, lambda_search = FALSE)

model.output.2 <- h2o.glm(y = "CAPSULE", x = c("AGE","RACE","PSA","DCAPS"),

training_frame = prostate.hex, family = "binomial", nfolds = 0,

alpha = 0.5, lambda_search = FALSE)

This is how it would look like with an regular GLM object using

`screenreg()`

`texreg`

`library(data.table)`

library(texreg)

d <- fread(prostatePath)

model.output.1.glm <- glm(CAPSULE ~ RACE + PSA + DCAPS, data=d)

model.output.2.glm <- glm(CAPSULE ~ AGE + RACE + PSA + DCAPS, data=d)

screenreg(list(model.output.1.glm, model.output.2.glm))

Answer

`texreg`

package author here. `texreg`

is based on generic functions, which means any user can add custom `extract`

methods for arbitrary models and make them work with `texreg`

. I will walk you through this below. I am hoping that this detailed exposition will help other people who have asked similar questions in the past to devise their own `texreg`

extensions.

See also Section 6 in the 2013 paper in the Journal of Statistical Software for another example. First, however, I will describe how the `texreg`

architecture works more generally to give you an idea what is possible.

`texreg`

and the generic `extract`

functionThere are three functions: `texreg`

(for LaTeX output), `htmlreg`

(for HTML output, which can also be interpreted by Word or Markdown in most usage scenarios), and `screenreg`

(for ASCII text output in the console). These three functions serve to convert some cleaned-up information (coefficients, standard errors, confidence intervals, p-values, goodness-of-fit statistics, model labels etc.) into the respective output formats. This is not perfect and could be even more flexible, but I think it has quite a few arguments for customization at this point, including things like `booktabs`

and `dcolumn`

support. So the big challenge is rather to get cleaned-up model information in the first place.

This is done by providing a `texreg`

object to any of these three functions. A `texreg`

object is just a container for coefficients etc. and is formally defined using an S4 class. To create a `texreg`

object, you can either use the constructor function `createTexreg`

(as documented on the help pages), which accepts all the different pieces of information as arguments, such as the standard errors etc. Or (better) you can use the `extract`

function to extract those pieces of information from some estimated model and return a `texreg`

object for use with any of the three functions. The way you typically do this is by just handing over a list of several models to the `texreg`

or `screenreg`

etc. function, and this function will internally call `extract`

to create `texreg`

objects and then process the information from those objects.

However, it is equally valid to just save the output of a call of the `extract`

function to an object, possibly manipulate this `texreg`

object, and then call the `texreg`

function on the manipulated object to display it as a table. This permits some flexibility in tweaking the results.

Earlier on, I mentioned that the package uses generic functions. This means that the `extract`

function is generic in the sense that you can register methods for it that work with arbitrary classes of models. For example, if the `extract`

function does not know how to handle `h2o`

objects and how to extract the relevant information from such an object, you can just write a method to do that and register it with the `extract`

function. Below, I will walk you through this step by step in the hope that people will learn from this detailed exposition and start writing their own extensions. (Note: if somebody develops a useful method, please e-mail me and I can include it in the next `texreg`

release.) It is also worth pointing out that the source files of the package contain more than 70 examples of `extract`

methods which you can use as templates. These examples are stored in the file `R/extract.R`

.

`extract`

methodThe first step is to identify the class name of the object. In your example, `class(model.output.1)`

returns the following class labels: "H2OBinomialModel" and "h2o". The first label is the more specific one, and the second label is the more general one. If all `h2o`

model objects are structured in a similar way, it will make sense to write an extension for `h2o`

objects and then decide within the method how to proceed with the specific model. As I know virtually nothing about the `h2o`

package, I prefer to start out with a more specific `extract`

method for `H2OBinomialModel`

objects in this case. It can be adjusted later on should we wish to do so.

`extract`

methods are structured as follows: you write a function called `extract.xyz`

(replace "xyz" by the class label), have at least one argument called `model`

, which accepts the model object (e.g., `model.output.1`

in your example), put some code in the body that extracts the relevant information from the `model`

object, create a `texreg`

object using the `createTexreg`

constructor, and return this object. Here is an empty container:

```
extract.H2OBinomialModel <- function(model, ...) {
s <- summary(model)
# extract information from model and summary object here
# then create and return a texreg object (replace NULL with actual values):
tr <- createTexreg(
coef.names = NULL, # character vector of coefficient labels
coef = NULL, # numeric vector with coefficients
se = NULL, # numeric vector with standard error values
pvalues = NULL, # numeric vector with p-values
gof.names = NULL, # character vector with goodness-of-fit labels
gof = NULL, # numeric vector of goodness-of-fit statistics
gof.decimal = NULL # logical vector: GOF statistic has decimal points?
)
return(tr)
}
```

Note that the function definition also contains the `...`

argument, which can be used for custom arguments that should be handed over to function calls within the `extract`

method.

Note also that the first line in the body of the function definition saves the summary of the model in an object called `s`

. This is often useful because many package writers decide to store some of the information in a simpler version in the summary, so one should usually consider both, the model and its summary, as useful sources of information. In some cases, one may have to look at the actual definition of the summary method in the respective package to find out how the pieces of information displayed on the summary page are computed when the `summary`

command is called because not all `summary`

methods store the different elements displayed in the `summary`

object.

`H2OBinomialModel`

objectThe next step is to examine the object and locate all the details that should be displayed in the final table. By looking at the output of `model.output.1`

, I would guess that the following part should constitute the GOF block at the bottom of the table:

```
MSE: 0.202947
R^2: 0.1562137
LogLoss: 0.5920097
Mean Per-Class Error: 0.3612191
AUC: 0.7185655
Gini: 0.4371311
Null Deviance: 512.2888
Residual Deviance: 449.9274
AIC: 457.9274
```

And the following part should probably constitute the coefficient block in the middle of the table:

```
Coefficients: glm coefficients
names coefficients standardized_coefficients
1 Intercept -1.835223 -0.336428
2 RACE -0.625222 -0.193052
3 DCAPS 1.314428 0.408336
4 PSA 0.046861 0.937107
```

In many cases, the summary contains the relevant information, but here printing the model yields what we need. We will need to locate all of this in the `model.output.1`

object (or its summary if applicable). To do that, there are several useful commands. Among them are `str(model.output.1)`

, `names(summary(model.output.1))`

, and similar commands.

Let's start with the coefficient block. Calling `str(model)`

reveals that there is a slot called `model`

in the S4 object. We can look at its contents by calling `model.output.1@model`

. The result is a list with several named elements, among them `coefficients_table`

. So we can access the coefficient table by calling `model.output.1@model$coefficients_table`

. The result is a data frame the columns of which we can access using the `$`

operator. In particular, we need the names and the coefficients. There are two types of coefficients here, standardized and unstandardized, and we can add an argument to our extract method later to let the user decide what he or she wants. Here is how we extract the coefficients and their labels:

```
coefnames <- model.output.1@model$coefficients_table$names
coefs <- model.output.1@model$coefficients_table$coefficients
coefs.std <- model.output.1@model$coefficients_table$standardized_coefficients
```

As far as I can see, there are no standard errors or p-values stored in the object. We could write some additional code to compute them should we wish to do that, but here we will focus on things that are readily provided as part of the model output.

It is important that we should not overwrite any existing function names in `R`

, such as `names`

or `coef`

. While doing this should technically work because the code is executed within a function later, this can easily lead to confusion while trying things out, so you should better avoid that.

Next, we need to locate the goodness-of-fit statistics. By examining the output of `str(model.output.1)`

carefully, we see that the goodness-of-fit statistics are contained in several slots under `model@model$training_metrics@metrics`

. Let's save them to some objects that are easier to access:

```
mse <- model.output.1@model$training_metrics@metrics$MSE
r2 <- model.output.1@model$training_metrics@metrics$r2
logloss <- model.output.1@model$training_metrics@metrics$logloss
mpce <- model.output.1@model$training_metrics@metrics$mean_per_class_error
auc <- model.output.1@model$training_metrics@metrics$AUC
gini <- model.output.1@model$training_metrics@metrics$Gini
nulldev <- model.output.1@model$training_metrics@metrics$null_deviance
resdev <- model.output.1@model$training_metrics@metrics$residual_deviance
aic <- model.output.1@model$training_metrics@metrics$AIC
```

In some cases, but not here, the author of a package writes methods for generic functions that can be used to extract some common information like the number of observations (`nobs(model)`

), AIC (`AIC(model)`

), BIC (`BIC(model)`

), deviance (`deviance(model)`

), or the log likelihood (`logLik(model)[[1]]`

). So these are things you may want to try first; but the `h2o`

package does not seem to offer such convenience methods.

`extract.H2OBinomialModel`

functionNow that we have located all pieces of information we need, we can add them to the `extract.H2OBinomialModel`

function we defined above.

However, we would like to let the user decide if he or she prefers raw or standardized coefficients, and we would like to let the user decide which goodness-of-fit statistics should be reported, so we add various logical arguments to the function header and then use if-conditions inside the function to check whether we should embed the respective statistics in the resulting `texreg`

object.

We also remove the line `s <- summary(model)`

in this case because we don't actually need any kind of information from the summary since we found everything we need in the model object.

The complete function may look like this:

```
# extension for H2OBinomialModel objects (h2o package)
extract.H2OBinomialModel <- function(model, standardized = FALSE,
include.mse = TRUE, include.rsquared = TRUE, include.logloss = TRUE,
include.meanerror = TRUE, include.auc = TRUE, include.gini = TRUE,
include.deviance = TRUE, include.aic = TRUE, ...) {
# extract coefficient table from model:
coefnames <- model@model$coefficients_table$names
if (standardized == TRUE) {
coefs <- model@model$coefficients_table$standardized_coefficients
} else {
coefs <- model@model$coefficients_table$coefficients
}
# create empty GOF vectors and subsequently add GOF statistics from model:
gof <- numeric()
gof.names <- character()
gof.decimal <- logical()
if (include.mse == TRUE) {
mse <- model@model$training_metrics@metrics$MSE
gof <- c(gof, mse)
gof.names <- c(gof.names, "MSE")
gof.decimal <- c(gof.decimal, TRUE)
}
if (include.rsquared == TRUE) {
r2 <- model@model$training_metrics@metrics$r2
gof <- c(gof, r2)
gof.names <- c(gof.names, "R^2")
gof.decimal <- c(gof.decimal, TRUE)
}
if (include.logloss == TRUE) {
logloss <- model@model$training_metrics@metrics$logloss
gof <- c(gof, logloss)
gof.names <- c(gof.names, "LogLoss")
gof.decimal <- c(gof.decimal, TRUE)
}
if (include.meanerror == TRUE) {
mpce <- model@model$training_metrics@metrics$mean_per_class_error
gof <- c(gof, mpce)
gof.names <- c(gof.names, "Mean Per-Class Error")
gof.decimal <- c(gof.decimal, TRUE)
}
if (include.auc == TRUE) {
auc <- model@model$training_metrics@metrics$AUC
gof <- c(gof, auc)
gof.names <- c(gof.names, "AUC")
gof.decimal <- c(gof.decimal, TRUE)
}
if (include.gini == TRUE) {
gini <- model@model$training_metrics@metrics$Gini
gof <- c(gof, gini)
gof.names <- c(gof.names, "Gini")
gof.decimal <- c(gof.decimal, TRUE)
}
if (include.deviance == TRUE) {
nulldev <- model@model$training_metrics@metrics$null_deviance
resdev <- model@model$training_metrics@metrics$residual_deviance
gof <- c(gof, nulldev, resdev)
gof.names <- c(gof.names, "Null Deviance", "Residual Deviance")
gof.decimal <- c(gof.decimal, TRUE, TRUE)
}
if (include.aic == TRUE) {
aic <- model@model$training_metrics@metrics$AIC
gof <- c(gof, aic)
gof.names <- c(gof.names, "AIC")
gof.decimal <- c(gof.decimal, TRUE)
}
# create texreg object:
tr <- createTexreg(
coef.names = coefnames,
coef = coefs,
gof.names = gof.names,
gof = gof,
gof.decimal = gof.decimal
)
return(tr)
}
```

For the goodness-of-fit block, you can see that I first created empty vectors and then subsequently populated them with additional statistics provided that the respective statistic was switched on by the user using the respective argument.

The `gof.decimal`

logical vector indicates for each GOF statistic whether it has decimal places (`TRUE`

) or not (`FALSE`

, as in the number of observations, for example).

Finally, the new function needs to be registered as a method for the generic `extract`

function. This is done using a simple command:

```
setMethod("extract", signature = className("H2OBinomialModel", "h2o"),
definition = extract.H2OBinomialModel)
```

Here, the first argument of `className`

is the class label, and the second one is the package in which the class is defined.

Summing up, the only two things that need to be done in order to write a custom extension are 1) writing an extract method, and 2) registering the method. That is, this code can be executed at runtime and doesn't have to be inserted into any package.

However, for your convenience, I have added the `H2OBinomialModel`

method to `texreg`

version 1.36.13, which should be available within a couple of hours (or possibly days) on the R-Forge project homepage under "R Packages".

Note that the solution presented here does not work with any previous version of `texreg`

because previous versions could not deal with models that had neither standard errors nor confidence intervals. This is a fairly specialized setup in my opinion, and I hadn't come across a package that merely provides the estimates without any uncertainty measures. I have now fixed this in `texreg`

.

`extract`

methodOnce the function definition and the `setMethod`

command have been executed at runtime, the following command can be used to create a table:

```
screenreg(list(model.output.1, model.output.2), custom.note = "")
```

This is the output:

```
======================================
Model 1 Model 2
--------------------------------------
Intercept -1.84 -1.11
RACE -0.63 -0.62
DCAPS 1.31 1.31
PSA 0.05 0.05
AGE -0.01
--------------------------------------
MSE 0.20 0.20
R^2 0.16 0.16
LogLoss 0.59 0.59
Mean Per-Class Error 0.36 0.38
AUC 0.72 0.72
Gini 0.44 0.44
Null Deviance 512.29 512.29
Residual Deviance 449.93 449.51
AIC 457.93 459.51
======================================
```

The `custom.note = ""`

argument makes sense here because we don't want a significance legend since the models do not report any uncertainty measures.

It is also possible to suppress some of the GOF measures and/or use standardized coefficients:

```
screenreg(list(model.output.1, model.output.2), custom.note = "",
include.deviance = FALSE, include.auc = FALSE, standardized = TRUE)
```

The result:

```
======================================
Model 1 Model 2
--------------------------------------
Intercept -0.34 -0.34
RACE -0.19 -0.19
DCAPS 0.41 0.41
PSA 0.94 0.94
AGE -0.07
--------------------------------------
MSE 0.20 0.20
R^2 0.16 0.16
LogLoss 0.59 0.59
Mean Per-Class Error 0.36 0.38
Gini 0.44 0.44
AIC 457.93 459.51
======================================
```

`createTexreg`

The `createTexreg`

constructor function is called within any `extract`

method. The example above shows some simple pieces of information. Besides the details contained in the example, the following slots are available in `texreg`

objects:

- se: standard errors
- pvalues: the p-values
- ci.low: lower bound of a confidence interval
- ci.up: upper bound of a confidence interval
- model.name: the title of the current model

Note that either confidence intervals or standard errors and p-values should be used, not both.

`texreg`

objectsBesides handing over the models to the `screenreg`

, `texreg`

, or `htmlreg`

functions directly, it is possible to save the extracted information to a `texreg`

object first and manipulate it before the table is displayed or saved. The added value is that even complex changes to a table are easy to apply this way. For example, it is possible to rename coefficients or GOF statistics, add new rows, rename a model, modify the values, or change the order of coefficients or GOF statistics. Here is how you can do that: First, you call the `extract`

function to save the information to a `texreg`

object:

```
tr <- extract(model.output.1)
```

You can display the contents of the `texreg`

object by just calling `tr`

, which shows the following output:

```
No standard errors and p-values were defined for this texreg object.
coef.
Intercept -1.83522343
RACE -0.62522179
DCAPS 1.31442834
PSA 0.04686106
GOF dec. places
MSE 0.2029470 TRUE
R^2 0.1562137 TRUE
LogLoss 0.5920097 TRUE
Mean Per-Class Error 0.3612191 TRUE
AUC 0.7185655 TRUE
Gini 0.4371311 TRUE
Null Deviance 512.2888402 TRUE
Residual Deviance 449.9273825 TRUE
AIC 457.9273825 TRUE
```

Alternatively, this is the structure of the object as shown by `str(tr)`

:

```
Formal class 'texreg' [package "texreg"] with 10 slots
..@ coef.names : chr [1:4] "Intercept" "RACE" "DCAPS" "PSA"
..@ coef : num [1:4] -1.8352 -0.6252 1.3144 0.0469
..@ se : num(0)
..@ pvalues : num(0)
..@ ci.low : num(0)
..@ ci.up : num(0)
..@ gof.names : chr [1:9] "MSE" "R^2" "LogLoss" "Mean Per-Class Error" ...
..@ gof : num [1:9] 0.203 0.156 0.592 0.361 0.719 ...
..@ gof.decimal: logi [1:9] TRUE TRUE TRUE TRUE TRUE TRUE ...
..@ model.name : chr(0)
```

Now you can just manipulate this object in arbitrary ways, e.g., add a GOF statistic:

```
tr@gof.names <- c(tr@gof.names, "new statistic")
tr@gof <- c(tr@gof, 12)
tr@gof.decimal <- c(tr@gof.decimal, FALSE)
```

Or you can change the order of coefficients:

```
tr@coef.names <- tr@coef.names[c(4, 1, 2, 3)]
tr@coef <- tr@coef[c(4, 1, 2, 3)]
```

When you are done with the manipulations, you can hand over the `texreg`

object instead of the original model when you call, for example, `screenreg`

:

```
screenreg(list(tr, model.output.2), custom.note = "")
```

The new result will look like this:

```
======================================
Model 1 Model 2
--------------------------------------
PSA 0.05 0.05
Intercept -1.84 -1.11
RACE -0.63 -0.62
DCAPS 1.31 1.31
AGE -0.01
--------------------------------------
MSE 0.20 0.20
R^2 0.16 0.16
LogLoss 0.59 0.59
Mean Per-Class Error 0.36 0.38
AUC 0.72 0.72
Gini 0.44 0.44
Null Deviance 512.29 512.29
Residual Deviance 449.93 449.51
AIC 457.93 459.51
new statistic 12
======================================
```

`texreg`

can be customized by users. Just write an extract method like the one shown above and register it using a `setMethods`

call. I have included the `H2OBinomialModel`

method in the latest `texreg`

version 1.36.13, along with a bugfix for using models without standard errors (such as this one).