Nick Criswell Nick Criswell - 6 months ago 105
R Question

Correlation Matrix - tidyr gather v. reshape2 melt

I would like to use

ggplot2
to make an upper triangle correlation matrix like this one. I can replicate that one just fine, but for some reason I'm stuck on really wanting to convert the
reshape2
functions to
tidyr
ones. I would think that I could use
gather
in place of
melt
, but that is not working.

Original Results using
reshape2



library(reshape2)
library(ggplot2)
mydata <- mtcars[, c(1,3,4,5,6,7)]
cormat <- round(cor(mydata),2)
library(reshape2)
melted_cormat <- melt(cormat)

# Get upper triangle of the correlation matrix
get_upper_tri <- function(cormat){
cormat[lower.tri(cormat)]<- NA
return(cormat)
}

upper_tri <- get_upper_tri(cormat)

melted_cormat <- melt(upper_tri, na.rm = TRUE)

ggplot(data = melted_cormat, aes(Var2, Var1, fill = value)) +
geom_tile()


enter image description here

My attempt at this using
gather
from
tidyr
.



library(tidyverse)


#first correlatoin matrix
cor_base <- round(cor(mydata), 2)
#now UT
cor_base[lower.tri(cor_base)] <- NA
cor_tri <- as.data.frame(cor_base) %>%
rownames_to_column("Var2") %>%
gather(key = Var1, value = value, -Var2, na.rm = TRUE) %>%
as.data.frame()

ggplot(data = cor_tri, aes(x = Var2, y = Var1, fill = value)) +
geom_tile()


enter image description here

The values are all the same, but some change in order occurred that is making this look wrong. A check of
identical
doesn't return
TRUE
but the values of the two data frames seem to be the same...

> identical(cor_tri, melted_cormat)
[1] FALSE
> dim(cor_tri)
[1] 21 3
> dim(melted_cormat)
[1] 21 3
> sum(cor_tri == melted_cormat)
[1] 63


Any thoughts on this or should I just go ahead and load
reshape2
to accomplish what I'm going for?

Thanks.

Answer Source

Essentially, it is the factor and character types of Var1 and Var2 between the reshape2 and tidyr versions. The former's melt() retains factors and order of correlation matrix: "mpg", "disp", "hp", "drat", "wt", "qsec" and latter's tibble:rownames_to_colums() creates character types in alphabetical order: "disp", "drat", "hp", "mpg", "qsec", "wt". As seen both have different levels affecting plot rendering.

To resolve, consider a dplyr::mutate line using base::factor(rownames(.), ...) and explicitly define the levels as original arrangement of cor_base's row.names(). Also, your Var1 and Var2 were reversed.

cor_base <- round(cor(mydata), 2)
cor_base[lower.tri(cor_base)] <- NA

cor_tri <- as.data.frame(cor_base) %>% 
  mutate(Var1 = factor(row.names(.), levels=row.names(.))) %>% 
  gather(key = Var2, value = value, -Var1, na.rm = TRUE, factor_key = TRUE) 

ggplot(data = cor_tri, aes(Var2, Var1, fill = value)) + 
  geom_tile()

Cor Matrix Plot Output


Also, for you or future readers here is the base::reshape version that too resolves above factor level issue:

cor_base <- round(cor(mydata), 2)
cor_base[lower.tri(cor_base)] <- NA

cor_base_df <- transform(as.data.frame(cor_base),
                         Var1 = factor(row.names(cor_base), levels=row.names(cor_base)))

cor_long <- subset(reshape(cor_base_df, idvar=c("Var1"), 
                           varying = c(1:(ncol(cor_base_df)-1)), v.names="value",
                           timevar = "Var2", 
                           times = factor(row.names(cor_base), levels=row.names(cor_base)),
                           new.row.names = 1:100,
                           direction = "long"), !is.na(value))

ggplot(data = cor_long, aes(Var2, Var1, fill = value)) + 
  geom_tile()
Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download