user3810226 user3810226 - 3 months ago 9
R Question

Error with Heatmap in R

I am trying to make heatmaps in R. Basically, there are two surveys and I was trying to map whether someone answered or did not answer a question. I was able to make one for the following using the code listed below:

x1 <- c(0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1)
x2 <- c(0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1)
x3 <- c(0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0)
x4 <- c(0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0)
x5 <- c(0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0)
x6 <- c(0, 1, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0)

x <- rbind(x1, x2, x3, x4, x5, x6)
hv <- heatmap(t(x), col = c("Forestgreen", "Darkorange2"), margins = c(4, 12), Colv = NA, Rowv = NA, scale = "column", xlab ="Person", ylab ="", main = "", labCol=c("1", "2", "3", "4", "5", "6"))
legend("topright", c("Non-Missing", "Missing"), col=c("Forestgreen", "Darkorange2"), bty="n", fill=c("Forestgreen", "Darkorange2"))


While the heatmap thus generated is fine, the one I try to create for the second survey is off. See code below:

y1 <- c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0)
y2 <- c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1)
y3 <- rep(c(0, 1), c(34, 2))
y4 <- c(0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0)
y5 <- c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0)
y6 <- c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0)
y7 <- rep(c(0, 1), each=18)
y8 <- c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0)
y9 <- c(0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 0, 1, 0, 1, 0, 0, 0, 1, 1, 1, 1, 1, 0, 1, 1, 1, 0, 0, 1, 1, 1)
y10 <- c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0)
y11 <- c(0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0)
y12 <- c(0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1)
y13 <- c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0)


y <- rbind(y1, y2, y3, y4, y5, y6, y7, y8, y9, y10, y11, y12, y13)
hv <- heatmap(t(y), col = c("Forestgreen", "Darkorange2"), margins = c(4, 12), Colv = NA, Rowv = NA, scale = "column", xlab ="Person", ylab ="", main = "")
legend("topright", c("Non-Missing", "Missing"), col=c("Forestgreen", "Darkorange2"), bty="n", fill=c("Forestgreen", "Darkorange2"))


I don't see why there is a white line essentially across y2. Especially when there was no issue with the first one. Any insight would be helpful. Thanks!

Answer

As noted in the comment, the problem here is that the values in y2 are all 1. You have instructed the heatmap function to scale based on the column values (scale = "column"). Since there's no variance in the second column, there's nothing to scale, and so you just get back nothing. The heatmap function should probably throw an error or a warning about this, but for whatever reason it isn't doing that.

The good news is that this is an easy fix. If you change scaling from "column" to "none", the issue resolves itself. In fact, interestingly, the other columns when scale = "column" appear to be wrong as well - I'm not sure why, especially as the problem goes away when you introduce variance in y2.

y1 <- c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0)
y2 <- c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1)
y3 <- rep(c(0, 1), c(34, 2))
y4 <- c(0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0)
y5 <- c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0)
y6 <- c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0)
y7 <- rep(c(0, 1), each=18)
y8 <- c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0)
y9 <- c(0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 0, 1, 0, 1, 0, 0, 0, 1, 1, 1, 1, 1, 0, 1, 1, 1, 0, 0, 1, 1, 1)
y10 <- c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0)
y11 <- c(0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0)
y12 <- c(0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1)
y13 <- c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0)

y <- rbind(y1, y2, y3, y4, y5, y6, y7, y8, y9, y10, y11, y12, y13) 

hv <- heatmap(t(y), col = c("Forestgreen", "Darkorange2"), margins = c(4, 12), Colv = NA, Rowv = NA, scale = "none", xlab ="Person", ylab ="", main = "")
legend("topright", c("Non-Missing", "Missing"), col=c("Forestgreen", "Darkorange2"), bty="n", fill=c("Forestgreen", "Darkorange2"))

The help on the scale argument to heatmap states:

character indicating if the values should be centered and scaled in either the row direction or the column direction, or none. The default is "row" if symm false, and "none" otherwise.

Centering and scaling on the column or row is accomplished by this code from the heatmap function:

else if (scale == "column") {
    x <- sweep(x, 2L, colMeans(x, na.rm = na.rm), check.margin = FALSE)
    sx <- apply(x, 2L, sd, na.rm = na.rm)
    x <- sweep(x, 2L, sx, "/", check.margin = FALSE)
}

Using some smaller example data makes a good demonstration.

x1 <- c(1,2,3)
x2 <- c(4,5,4)
x3 <- c(1,1,1)

data_mat <- cbind(x1,x2,x3)
print(data_mat)
     x1 x2 x3
[1,]  1  4  1
[2,]  2  5  1
[3,]  3  4  1
data_mat <- sweep(x = data_mat,MARGIN = 2,STATS = colMeans(data_mat))
print(data_mat)
     x1 x2 x3
[1,]  1  4  1
[2,]  2  5  1
[3,]  3  4  1
sd_data_mat <- apply(X = data_mat, MARGIN = 2, FUN = sd)
print(sd_data_mat)
     x1         x2 x3
[1,] -1 -0.3333333  0
[2,]  0  0.6666667  0
[3,]  1 -0.3333333  0
data_mat <- sweep(x = data_mat,MARGIN = 2,STATS = sd_data_mat,FUN = "/")
print(data_mat)
     x1         x2  x3
[1,] -1 -0.5773503 NaN
[2,]  0  1.1547005 NaN
[3,]  1 -0.5773503 NaN

You can see that in x3, you end up with NaN, as you are dividing 0 by 0. This ends up getting passed for plotting later, which causes the column to be missing.