f1r3br4nd f1r3br4nd - 1 month ago 7
R Question

Is there a better alternative than string manipulation to programmatically build formulas?

Everyone else's functions seem to take formula objects and then do dark magic to them somewhere deep inside and I'm jealous.

I'm writing a function that fits multiple models. Parts of the formulas for these models remain the same and part change from one model to the next. The clumsy way would be to have the user input the formula parts as character strings, do some character manipulation on them, and then use

as.formula
.

But before I go that route, I just want to make sure that I'm not overlooking some cleaner way of doing it that would allow the function to accept formulas in the standard R format (e.g. extracted from other formula-using objects).

I want something like...

> LHS <- y~1; RHS <- ~a+b; c(LHS,RHS);
y ~ a + b
> RHS2 <- ~c;
> c(LHS, RHS, RHS2);
y ~ a + b + c


or...

> LHS + RHS;
y ~ a + b
> LHS + RHS + RHS2;
y ~ a + b + c


...but unfortunately neither syntax works. Does anybody know if there is something that does? Thanks.

Answer

reformulate will do what you want.

reformulate(termlabels = c('x','z'), response = 'y')
## y ~ x + z

Or without an intercept

reformulate(termlabels = c('x','z'), response = 'y', intercept = FALSE)
## y ~ x + z - 1

Note that you cannot construct formulae with multiple reponses such as x+y ~z+b

reformulate(termlabels = c('x','y'), response = c('z','b'))
z ~ x + y

To extract the terms from an existing formula (given your example)

attr(terms(RHS), 'term.labels')
## [1] "a" "b"

To get the response is slightly different, a simple approach (for a single variable response).

as.character(LHS)[2]
## [1] 'y'


combine_formula <- function(LHS, RHS){
  .terms <- lapply(RHS, terms)
  new_terms <- unique(unlist(lapply(.terms, attr, which = 'term.labels')))
  response <- as.character(LHS)[2]

  reformulate(new_terms, response)


}


combine_formula(LHS, list(RHS, RHS2))

## y ~ a + b + c
## <environment: 0x577fb908>

I think it would be more sensible to specify the response as a character vector, something like

combine_formula2 <- function(response, RHS, intercept = TRUE){
  .terms <- lapply(RHS, terms)
  new_terms <- unique(unlist(lapply(.terms, attr, which = 'term.labels')))
  response <- as.character(LHS)[2]

  reformulate(new_terms, response, intercept)


}
combine_formula2('y', list(RHS, RHS2))

you could also define a + operator to work with formulae (update setting an new method for formula objects)

`+.formula` <- function(e1,e2){
  .terms <- lapply(c(e1,e2), terms)
  reformulate(unique(unlist(lapply(.terms, attr, which = 'term.labels'))))
}

RHS + RHS2
## ~a + b + c

You can also use update.formula using . judiciously

 update(~a+b, y ~ .)
 ##  y~a+b