ahmed_Mobile ahmed_Mobile - 4 months ago 14
R Question

Producing logistic curve for my logistic regression model

I want to write the code for plotting my logistic regression model, i.e., the "S"-shape logistic curve. How could that be done please as I have have two independent covariates? I'm attaching my data set, and the code for my model. Thank you in advance.

239 0.72 1
324.6 0.83 1
331.8 0.95 1
334.3 0.83 1
259.7 0.89 1
212.3 0.88 1
204.7 0.65 1
253.86 0.75 1
258.94 0.85 1
329.66 0.95 0
469.68 1.46 0
459.74 1.11 0
293.2 0.64 0
297.88 0.98 0
267.9 0.82 0
374.1 1.29 0
333.62 0.74 0


dat <- read.table("data.txt")
colnames(dat)<-c("press","v","gender")

# logostic regression
dat$gender <- factor(dat$gender)
mylogit<- glm(gender~press+v,data=dat,family="binomial")
summary(mylogit)

######## the code below are irrelevant to making plot, ignore if you want

mylogit$fitted.values

newdat <- data.frame(t(c(300,0.1)))
colnames(newdat)<-c("press","v")
# this is your new dataset, we name it as "newdat"
pred <- predict(mylogit,newdata = newdat,type="response")
pred # the probability of being in class 1 will stored in this object

pred <- predict(mylogit,newdata = dat,type="response")
pred # the probability of being in class 1 will stored in this object
# accuracy
dat$pred <- 0
factor(dat$pred)
dat$pred[which(pred>0.5)] <- 1

table(dat$gender,dat$pred)

Answer

You have 2 continuous, non-categorical variables, so the logistic curve will be a 3D curve. I will offer you two ways for presentation.

  • use persp function to produce a real 3D smooth curve;
  • fix v at a number of values, then produce a number of 2D logistic curve (which you called "S"-shape curve).

3D curve

press_grid <- seq(200, 480, by = 5)
v_grid <- seq(0.6, 1.5, by = 0.1)
newdat <- data.frame(press = rep(press_grid, times = length(v_grid)), v = rep(v_grid, each = length(press_grid)))
pred <- predict.glm(mylogit, newdata = newdat, type="response")
z <- matrix(pred, length(press_grid))
persp(press_grid, v_grid, z, xlab = "pressure", ylab = "velocity", zlab = "predicted probability", main = "logistic curve (3D)", theta = 30, phi = 20)

You need to first generate a 2D grid. The newdat holds this grid, and you can do plot(newdat) to see this grid. Then prediction will take place on this grid, by calling predict.glm(..., type = "response"). The result pred is a vector. To plot it, cast it to a matrix z, then invoke persp to make 3D plot. xlab, ylab and zlab are labels for three axis. The parameters theta and phi are used to tweak your viewing angles.

In the above, the marginal grid for press and v are based on the range of your original data: range(dat$press) and range(dat$v). We don't make prediction beyond this range too far. But even within this range, you only have 17 observations. So you need still be sceptical on the plot.

Here is the curve:

3D

2D curves

This toy function is useful for making a 2D curve, with v fixed as some level:

curve_2D_fix_v <- function(model, v = 1, press_grid = seq(200, 480, by = 5), add = FALSE, col = "black") {
  newdat <- data.frame(press = press_grid, v = v)
  pred <- predict.glm(model, newdat, type = "response")
  if (add) lines(press_grid, pred, col = col) else {
    plot(press_grid, pred, xlab = "pressure", ylab = "predicted probability", type = "l", col = col, main = "logistic curve (2D)")
    abline(h = c(0, 0.5, 1), lty = 2, col = col)
    }
  }

If add = FALSE, it opens a new plotting window; while it is TRUE, it plots on previous window (but it is your duty to make sure there is such a window!) The 2D plot gives more information, because you can add a horizontal line at 0, 0.5 and 1.

Let's have a go:

curve_2D_fix_v(mylogit, v = 0.4, add = FALSE, col = "black")
curve_2D_fix_v(mylogit, v = 0.6, add = TRUE, col = "red")
curve_2D_fix_v(mylogit, v = 0.8, add = TRUE, col = "green")
curve_2D_fix_v(mylogit, v = 1, add = TRUE, col = "blue")
curve_2D_fix_v(mylogit, v = 1.2, add = TRUE, col = "cyan")
curve_2D_fix_v(mylogit, v = 0.4, add = TRUE, col = "yellow")

Here is the curve:

2D


Discussion

In both plots, we see that the relationship between gender (predicted probability) and v (velocity) is not very strong. In 2D plot, almost all values of v produce the same curve. On the other hand, press (pressure) is a strong effect.

Back to your model:

> summary(mylogit)
Coefficients:
            Estimate Std. Error z value Pr(>|z|)  
(Intercept)  8.08326    4.45463   1.815   0.0696 .
press       -0.02575    0.01618  -1.591   0.1115  
v           -0.15385    4.83824  -0.032   0.9746  
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

You can see that v is not significant at all! While strictly speaking, press is also not significant at 0.1 level. So this is a very weak model. I suggest you drop variable v and do the model again, using press as the only variable.

Comments