user33939 - 1 month ago 10

R Question

My problem is with the

`predict()`

Using the predictions coming from my model, I would like to visualize how my significant factors (and their interaction) affect the probability of my response variable.

My model:

`m1 <-glm ( mating ~ behv * pop +`

I(behv^2) * pop + condition,

data=data1, family=binomial(logit))

Significant effects after running the glm:

Although I have read the help pages, previous answers to similar questions, tutorials etc., I couldn't figure out how to structure the

`newdata=`

`predict()`

Answer

Really the only thing that `predict`

expects is that the names of the columns in `newdata`

exactly match the column names used in the formula. And you must have values for each of your predictors. Here's some sample data.

```
#sample data
set.seed(16)
data <- data.frame(
mating=sample(0:1, 200, replace=T),
pop=sample(letters[1:4], 200, replace=T),
behv = scale(rpois(200,10)),
condition = scale(rnorm(200,5))
)
data1<-data[1:150,] #for model fitting
data2<-data[51:200,-1] #for predicting
```

Then this will fit the model using `data1`

and predict into `data2`

```
model<-glm ( mating ~ behv * pop +
I(behv^2) * pop + condition,
data=data1,
family=binomial(logit))
predict(model, newdata=data2, type="response")
```

Using `type="response"`

will give you the predicted probabilities.

Now to make predictions, you don't have to use a subset from the exact same `data.frame`

. You can create a new one to investigate a particular range of values (just make sure the column names match up. So in order to explore `behv*pop2`

(or `behv*popb`

in my sample data), I might create a data.frame like this

```
popbbehv<-data.frame(
pop="b",
behv=seq(from=min(data$behv), to=max(data$behv), length.out=100),
condition = mean(data$condition)
)
```

Here I fix `pop="b"`

so i'm only looking at the `pop`

, and since I have to supply `condition`

as well, i fix that at the mean of the original data. (I could have just put in 0 since the data is centered and scaled.) Now I specify a range of `behv`

values i'm interested in. Here i just took the range of the original data and split it into 100 regions. This will give me enough points to plot. So again i use `predict`

to get

```
popbbehvpred<-predict(model, newdata=popbbehv, type="response")
```

and then I can plot that with

```
plot(popbbehvpred~behv, popbbehv, type="l")
```

Although nothing is significant in my fake data, we can see that higher behavior values seem to result in less mating for population B.