Vincent - 4 months ago 10

R Question

I have a formula and a data frame, and I want to extract the

`model.matrix()`

`model.frame()`

`na.action=NULL`

`model.matrix()`

I'm sure I could hack something together using loops or something, but I was wondering if anyone could suggest a cleaner and more efficient workaround. Thanks a lot for your time!

And here's an example:

`dat <- data.frame(matrix(rnorm(20),5,4), gl(5,2))`

dat[3,5] <- NA

names(dat) <- c(letters[1:4], 'fact')

ff <- a ~ b + fact

# This omits the row with a missing observation on the factor

model.matrix(ff, dat)

# This keeps the NA, but it gives me a data frame and does not dichotomize the factor

model.frame(ff, dat, na.action=NULL)

Here is what I would like to obtain:

`(Intercept) b fact2 fact3 fact4 fact5`

1 1 0.7266086 0 0 0 0

2 1 -0.6088697 0 0 0 0

3 NA 0.4643360 NA NA NA NA

4 1 -1.1666248 1 0 0 0

5 1 -0.7577394 0 1 0 0

6 1 0.7266086 0 1 0 0

7 1 -0.6088697 0 0 1 0

8 1 0.4643360 0 0 1 0

9 1 -1.1666248 0 0 0 1

10 1 -0.7577394 0 0 0 1

Answer

You can mess around a little with the `model.matrix`

object, based on the rownames :

```
MM <- model.matrix(ff,dat)
MM <- MM[match(rownames(dat),rownames(MM)),]
MM[,"b"] <- dat$b
rownames(MM) <- rownames(dat)
```

which gives :

```
> MM
(Intercept) b fact2 fact3 fact4 fact5
1 1 0.9583010 0 0 0 0
2 1 0.3266986 0 0 0 0
3 NA 1.4992358 NA NA NA NA
4 1 1.2867461 1 0 0 0
5 1 0.5024700 0 1 0 0
6 1 0.9583010 0 1 0 0
7 1 0.3266986 0 0 1 0
8 1 1.4992358 0 0 1 0
9 1 1.2867461 0 0 0 1
10 1 0.5024700 0 0 0 1
```

Alternatively, you can use `contrasts()`

to do the work for you. Constructing the matrix by hand would be :

```
cont <- contrasts(dat$fact)[as.numeric(dat$fact),]
colnames(cont) <- paste("fact",colnames(cont),sep="")
out <- cbind(1,dat$b,cont)
out[is.na(dat$fact),1] <- NA
colnames(out)[1:2]<- c("Intercept","b")
rownames(out) <- rownames(dat)
```

which gives :

```
> out
Intercept b fact2 fact3 fact4 fact5
1 1 0.2534288 0 0 0 0
2 1 0.2697760 0 0 0 0
3 NA -0.8236879 NA NA NA NA
4 1 -0.6053445 1 0 0 0
5 1 0.4608907 0 1 0 0
6 1 0.2534288 0 1 0 0
7 1 0.2697760 0 0 1 0
8 1 -0.8236879 0 0 1 0
9 1 -0.6053445 0 0 0 1
10 1 0.4608907 0 0 0 1
```

In any case, both methods can be incorporated in a function that can deal with more complex formulae. I leave the exercise to the reader (what do I loath that sentence when I meet it in a paper ;-) )