ahmed_Mobile - 6 months ago 31

R Question

I want to write the code for plotting my logistic regression model, i.e., the "S"-shape logistic curve. How could that be done please as I have have **two independent covariates**? I'm attaching my data set, and the code for my model. Thank you in advance.

`239 0.72 1`

324.6 0.83 1

331.8 0.95 1

334.3 0.83 1

259.7 0.89 1

212.3 0.88 1

204.7 0.65 1

253.86 0.75 1

258.94 0.85 1

329.66 0.95 0

469.68 1.46 0

459.74 1.11 0

293.2 0.64 0

297.88 0.98 0

267.9 0.82 0

374.1 1.29 0

333.62 0.74 0

dat <- read.table("data.txt")

colnames(dat)<-c("press","v","gender")

# logostic regression

dat$gender <- factor(dat$gender)

mylogit<- glm(gender~press+v,data=dat,family="binomial")

summary(mylogit)

######## the code below are irrelevant to making plot, ignore if you want

mylogit$fitted.values

newdat <- data.frame(t(c(300,0.1)))

colnames(newdat)<-c("press","v")

# this is your new dataset, we name it as "newdat"

pred <- predict(mylogit,newdata = newdat,type="response")

pred # the probability of being in class 1 will stored in this object

pred <- predict(mylogit,newdata = dat,type="response")

pred # the probability of being in class 1 will stored in this object

# accuracy

dat$pred <- 0

factor(dat$pred)

dat$pred[which(pred>0.5)] <- 1

table(dat$gender,dat$pred)

Answer

You have 2 continuous, non-categorical variables, so the logistic curve will be a 3D curve. I will offer you two ways for presentation.

- use
`persp`

function to produce a real 3D smooth curve; - fix
`v`

at a number of values, then produce a number of 2D logistic curve (which you called "S"-shape curve).

**3D curve**

```
press_grid <- seq(200, 480, by = 5)
v_grid <- seq(0.6, 1.5, by = 0.1)
newdat <- data.frame(press = rep(press_grid, times = length(v_grid)), v = rep(v_grid, each = length(press_grid)))
pred <- predict.glm(mylogit, newdata = newdat, type="response")
z <- matrix(pred, length(press_grid))
persp(press_grid, v_grid, z, xlab = "pressure", ylab = "velocity", zlab = "predicted probability", main = "logistic curve (3D)", theta = 30, phi = 20)
```

You need to first generate a 2D grid. The `newdat`

holds this grid, and you can do `plot(newdat)`

to see this grid. Then prediction will take place on this grid, by calling `predict.glm(..., type = "response")`

. The result `pred`

is a vector. To plot it, cast it to a matrix `z`

, then invoke `persp`

to make 3D plot. `xlab`

, `ylab`

and `zlab`

are labels for three axis. The parameters `theta`

and `phi`

are used to tweak your viewing angles.

In the above, the marginal grid for `press`

and `v`

are based on the range of your original data: `range(dat$press)`

and `range(dat$v)`

. We don't make prediction beyond this range too far. But even within this range, you only have 17 observations. So you need still be sceptical on the plot.

Here is the curve:

**2D curves**

This toy function is useful for making a 2D curve, with `v`

fixed as some level:

```
curve_2D_fix_v <- function(model, v = 1, press_grid = seq(200, 480, by = 5), add = FALSE, col = "black") {
newdat <- data.frame(press = press_grid, v = v)
pred <- predict.glm(model, newdat, type = "response")
if (add) lines(press_grid, pred, col = col) else {
plot(press_grid, pred, xlab = "pressure", ylab = "predicted probability", type = "l", col = col, main = "logistic curve (2D)")
abline(h = c(0, 0.5, 1), lty = 2, col = col)
}
}
```

If `add = FALSE`

, it opens a new plotting window; while it is `TRUE`

, it plots on previous window (but it is your duty to make sure there is such a window!) The 2D plot gives more information, because you can add a horizontal line at 0, 0.5 and 1.

Let's have a go:

```
curve_2D_fix_v(mylogit, v = 0.4, add = FALSE, col = "black")
curve_2D_fix_v(mylogit, v = 0.6, add = TRUE, col = "red")
curve_2D_fix_v(mylogit, v = 0.8, add = TRUE, col = "green")
curve_2D_fix_v(mylogit, v = 1, add = TRUE, col = "blue")
curve_2D_fix_v(mylogit, v = 1.2, add = TRUE, col = "cyan")
curve_2D_fix_v(mylogit, v = 0.4, add = TRUE, col = "yellow")
```

Here is the curve:

**Discussion**

In both plots, we see that the relationship between `gender`

(predicted probability) and `v`

(velocity) is not very strong. In 2D plot, almost all values of `v`

produce the same curve. On the other hand, `press`

(pressure) is a strong effect.

Back to your model:

```
> summary(mylogit)
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 8.08326 4.45463 1.815 0.0696 .
press -0.02575 0.01618 -1.591 0.1115
v -0.15385 4.83824 -0.032 0.9746
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
```

You can see that `v`

is not significant at all! While strictly speaking, `press`

is also not significant at 0.1 level. **So this is a very weak model**. I suggest you drop variable `v`

and do the model again, using `press`

as the only variable.