Dr.Fykos Dr.Fykos - 1 month ago 12
R Question

ggplot2 scale colours for heatmap

I have a dataset with positive and negative values and I am trying to generate a heatmap in ggplot which will have different color gradients for all the values less than zero and all the values greater than zero.

I managed to work that out with the code below but scale on the legend the whole colour range and it doesn't represent the data well. I tried to normalize and scale the data between 0 and 1 but this produces a continuous colour scale with just one colour.

You can find the data here http://pastebin.com/gVHBcVc6

I would appreciate any other ideas.

mylimits <- c(round(min(dat$ratio[!is.na(dat$ratio) > 0])),
round(min(dat$ratio[!is.na(dat$ratio) > 0])) / 2,
-0.2,
0,
0.2,
round(max(dat$ratio[!is.na(dat$ratio) > 0])) / 2,
round(max(dat$ratio[!is.na(dat$ratio) > 0])))


ggplot(data = dat, aes(x = ACC, y = variable)) +
geom_tile(aes(fill = as.numeric(sprintf("%1.2f", 100 * ratio))), colour = 'white') +
geom_text(aes(label = text), size = 2) +
scale_fill_gradientn(colours=c('red', 'yellow', 'cyan', 'blue'),
values = rescale(mylimits)) +
theme(axis.text.x = element_text(angle = 60, hjust = 1, color="black"), legend.title = element_blank(), legend.position="top", legend.key.size = unit(2.5, "cm"))

Answer

In scale_fill_gradientn(), it is needed that length(colours) and length(values) are the same. When you define lims, it would be better to calculate what value zero becomes when converting into 0-1. And I changed colors at the value (I took the values under -100 as outliers and gave it special treatment). (Edited: when you give nbin large value, legend can express rapid change at zero).

  ## combine fill values (because of convenience, not necessary)
dat <- cbind(dat, ratio2 = as.numeric(sprintf("%1.2f", 100 * dat$ratio)))

  ## get some values using range(data) 
r_range <- range(dat$ratio2, na.rm = T)

zero_val <- 1 / diff(r_range) * -r_range[1]                # zero value after converting
minus100_val <- 1 / diff(r_range) * - (r_range[1] + 100)   # -100

 ## define mylim and mycol (basic idea: c(..., zero_val - 1.0E-6, zero_val + 1.0E-6, ...))
mylim <- c(0, seq(minus100_val, zero_val - 1.0E-6, length = 3),   # minus
           seq(zero_val + 1.0E-6, 1, length = 3))                 # plus

mycol <- c("navy", "blue", "cyan", "lightcyan",                    # minus
           "yellow", "red", "red4")                                # plus


ggplot(data = dat, aes(x = ACC, y = variable)) +
  geom_tile(aes(fill = ratio2), colour = 'white') +
  #geom_text(aes(label = text), size = 2) +
  scale_fill_gradientn(colours = mycol, values = mylim) +
  theme(axis.text.x = element_text(angle = 60, hjust = 1, color="black"), 
        legend.title = element_blank(), legend.position="top", legend.key.size = unit(2.5, "cm"))

 ## to check
ggplot(data = dat, aes(x = ACC, y = variable)) +
  geom_tile(aes(fill = cut(ratio2, breaks = c(-Inf, 0, Inf))), colour = 'white')

enter image description here

Comments