Ben Bolker - 1 year ago 143

R Question

I have a formula that contains some terms and a data frame (the output of an earlier

`model.frame()`

`ff <- log(Reaction) ~ log(1+Days) + x + y`

fr <- data.frame(`log(Reaction)`=1:4,

`log(1+Days)`=1:4,

x=1:4,

y=1:4,

z=1:4,

check.names=FALSE)

The desired result is

`fr`

`z`

`fr[,1:4]`

Some strategies that

`fr[all.vars(ff)]`

## Error in `[.data.frame`(fr, all.vars(ff)) : undefined columns selected

(because

`all.vars()`

`"Reaction"`

`log("Reaction")`

`stripwhite <- function(x) gsub("(^ +| +$)","",x)`

vars <- stripwhite(unlist(strsplit(as.character(ff)[-1],"\\+")))

fr[vars]

## Error in `[.data.frame`(fr, vars) : undefined columns selected

(because splitting on

`+`

`log(1+Days)`

I've been thinking about walking down the parse tree of the formula:

`ff[[3]] ## log(1 + Days) + x + y`

ff[[3]][[1]] ## `+`

ff[[3]][[2]] ## log(1 + Days) + x

but I haven't got a solution put together, and it seems like I'm going down a rabbit hole. Ideas?

Answer Source

This should work:

```
> fr[gsub(" ","",rownames(attr(terms.formula(ff), "factors")))]
log(Reaction) log(1+Days) x y
1 1 1 1 1
2 2 2 2 2
3 3 3 3 3
4 4 4 4 4
```

And props to Roman Luštrik for pointing me in the right direction.

Edit: Looks like you could pull it out off the "variables" attribute as well:

```
fr[gsub(" ","",attr(terms(ff),"variables")[-1])]
```

Edit 2: Found first problem case, involving `I()`

or `offset()`

:

```
ff <- I(log(Reaction)) ~ I(log(1+Days)) + x + y
fr[gsub(" ","",attr(terms(ff),"variables")[-1])]
```

Those would be pretty easy to correct with regex, though. BUT, if you had situations like in the question where a variable is called, e.g., `log(x)`

and is used in a formula alongside something like `I(log(y))`

for variable `y`

, this will get really messy.