gjabel gjabel - 3 months ago 8
R Question

Adding an column for the category of glm coeffients in broom results

Is there any way to add a column to the result of the broom package's

tidy
function that can act relate the term column back to both the original names used in the
formula
argument and their columns in the
data
argument.

For example if I run the following I get:

library(ggplot2)
library(dplyr)

mod <- glm(mpg ~ wt + qsec + as.factor(carb), data = mtcars)

tidy(mod)

# term estimate std.error statistic p.value
# 1 (Intercept) 21.132995090 7.5756463 2.78959633 1.017187e-02
# 2 wt -4.916303175 0.6747590 -7.28601380 1.584408e-07
# 3 qsec 0.843355538 0.3930252 2.14580532 4.221188e-02
# 4 as.factor(carb)2 0.004133826 1.5321134 0.00269812 9.978695e-01
# 5 as.factor(carb)3 -0.755346006 2.3451222 -0.32209239 7.501715e-01
# 6 as.factor(carb)4 -0.489721798 2.0628564 -0.23739985 8.143615e-01
# 7 as.factor(carb)6 -0.886846134 3.4443957 -0.25747510 7.990068e-01
# 8 as.factor(carb)8 -0.894783610 3.7496630 -0.23863041 8.134180e-01


What I am looking for is something like this:

# term estimate std.error statistic p.value term_base
# 1 (Intercept) 21.132995090 7.5756463 2.78959633 1.017187e-02
# 2 wt -4.916303175 0.6747590 -7.28601380 1.584408e-07 wt
# 3 qsec 0.843355538 0.3930252 2.14580532 4.221188e-02 qsec
# 4 as.factor(carb)2 0.004133826 1.5321134 0.00269812 9.978695e-01 carb
# 5 as.factor(carb)3 -0.755346006 2.3451222 -0.32209239 7.501715e-01 carb
# 6 as.factor(carb)4 -0.489721798 2.0628564 -0.23739985 8.143615e-01 carb
# 7 as.factor(carb)6 -0.886846134 3.4443957 -0.25747510 7.990068e-01 carb
# 8 as.factor(carb)8 -0.894783610 3.7496630 -0.23863041 8.134180e-01 carb


Not so bothered if the first row in this new column is empty,
Intercept
or
1
. Just need something that can match the term column to the original variable names passed to the formula?

Edit

Would be good if it didn't depend on using
as.factor
in the formula, e.g. would work on:

mod <- glm(mpg ~ wt + qsec + carb, data = mtcars %>% mutate(carb = factor(carb)))

tidy(mod)

# term estimate std.error statistic p.value
# 1 (Intercept) 21.132995090 7.5756463 2.78959633 1.017187e-02
# 2 wt -4.916303175 0.6747590 -7.28601380 1.584408e-07
# 3 qsec 0.843355538 0.3930252 2.14580532 4.221188e-02
# 4 carb2 0.004133826 1.5321134 0.00269812 9.978695e-01
# 5 carb3 -0.755346006 2.3451222 -0.32209239 7.501715e-01
# 6 carb4 -0.489721798 2.0628564 -0.23739985 8.143615e-01
# 7 carb6 -0.886846134 3.4443957 -0.25747510 7.990068e-01
# 8 carb8 -0.894783610 3.7496630 -0.23863041 8.134180e-01

Answer

We can use regex to create the 'term_base' column

tidy(mod) %>%
        mutate(term_base = sub("Intercept", "", gsub(".*\\(|\\).*", "", term)))
#              term     estimate std.error   statistic      p.value term_base
#1      (Intercept) 21.132995090 7.5756463  2.78959633 1.017187e-02          
#2               wt -4.916303175 0.6747590 -7.28601380 1.584408e-07        wt
#3             qsec  0.843355538 0.3930252  2.14580532 4.221188e-02      qsec
#4 as.factor(carb)2  0.004133826 1.5321134  0.00269812 9.978695e-01      carb
#5 as.factor(carb)3 -0.755346006 2.3451222 -0.32209239 7.501715e-01      carb
#6 as.factor(carb)4 -0.489721798 2.0628564 -0.23739985 8.143615e-01      carb
#7 as.factor(carb)6 -0.886846134 3.4443957 -0.25747510 7.990068e-01      carb
#8 as.factor(carb)8 -0.894783610 3.7496630 -0.23863041 8.134180e-01      carb

The as.factor can be removed from the 'term' as well if we mutate the 'carb' to factor before the glm step

mtcars %>%
     mutate(carb = factor(carb)) %>% 
     glm(formula = mpg ~wt + qsec + carb, data = .) %>% 
     tidy(.) %>%
     mutate(term_base = sub("\\(.*\\)|\\d+", "", term))
#     term     estimate std.error   statistic      p.value term_base
#1 (Intercept) 21.132995090 7.5756463  2.78959633 1.017187e-02          
#2          wt -4.916303175 0.6747590 -7.28601380 1.584408e-07        wt
#3        qsec  0.843355538 0.3930252  2.14580532 4.221188e-02      qsec
#4       carb2  0.004133826 1.5321134  0.00269812 9.978695e-01      carb
#5       carb3 -0.755346006 2.3451222 -0.32209239 7.501715e-01      carb
#6       carb4 -0.489721798 2.0628564 -0.23739985 8.143615e-01      carb
#7       carb6 -0.886846134 3.4443957 -0.25747510 7.990068e-01      carb
#8       carb8 -0.894783610 3.7496630 -0.23863041 8.134180e-01      carb
Comments