Nick Criswell - 1 year ago 195
R Question

# Correlation Matrix - tidyr gather v. reshape2 melt

I would like to use

`ggplot2`
to make an upper triangle correlation matrix like this one. I can replicate that one just fine, but for some reason I'm stuck on really wanting to convert the
`reshape2`
functions to
`tidyr`
ones. I would think that I could use
`gather`
in place of
`melt`
, but that is not working.

### Original Results using `reshape2`

``````library(reshape2)
library(ggplot2)
mydata <- mtcars[, c(1,3,4,5,6,7)]
cormat <- round(cor(mydata),2)
library(reshape2)
melted_cormat <- melt(cormat)

# Get upper triangle of the correlation matrix
get_upper_tri <- function(cormat){
cormat[lower.tri(cormat)]<- NA
return(cormat)
}

upper_tri <- get_upper_tri(cormat)

melted_cormat <- melt(upper_tri, na.rm = TRUE)

ggplot(data = melted_cormat, aes(Var2, Var1, fill = value)) +
geom_tile()
``````

### My attempt at this using `gather` from `tidyr`.

``````library(tidyverse)

#first correlatoin matrix
cor_base <- round(cor(mydata), 2)
#now UT
cor_base[lower.tri(cor_base)] <- NA
cor_tri <- as.data.frame(cor_base) %>%
rownames_to_column("Var2") %>%
gather(key = Var1, value = value, -Var2, na.rm = TRUE) %>%
as.data.frame()

ggplot(data = cor_tri, aes(x = Var2, y = Var1, fill = value)) +
geom_tile()
``````

The values are all the same, but some change in order occurred that is making this look wrong. A check of
`identical`
doesn't return
`TRUE`
but the values of the two data frames seem to be the same...

``````> identical(cor_tri, melted_cormat)
[1] FALSE
> dim(cor_tri)
[1] 21  3
> dim(melted_cormat)
[1] 21  3
> sum(cor_tri == melted_cormat)
[1] 63
``````

Any thoughts on this or should I just go ahead and load
`reshape2`
to accomplish what I'm going for?

Thanks.

Essentially, it is the `factor` and `character` types of Var1 and Var2 between the reshape2 and tidyr versions. The former's `melt()` retains factors and order of correlation matrix: `"mpg", "disp", "hp", "drat", "wt", "qsec"` and latter's `tibble:rownames_to_colums()` creates character types in alphabetical order: `"disp", "drat", "hp", "mpg", "qsec", "wt"`. As seen both have different levels affecting plot rendering.

To resolve, consider a `dplyr::mutate` line using `base::factor(rownames(.), ...`) and explicitly define the levels as original arrangement of cor_base's `row.names()`. Also, your Var1 and Var2 were reversed.

``````cor_base <- round(cor(mydata), 2)
cor_base[lower.tri(cor_base)] <- NA

cor_tri <- as.data.frame(cor_base) %>%
mutate(Var1 = factor(row.names(.), levels=row.names(.))) %>%
gather(key = Var2, value = value, -Var1, na.rm = TRUE, factor_key = TRUE)

ggplot(data = cor_tri, aes(Var2, Var1, fill = value)) +
geom_tile()
``````

Also, for you or future readers here is the `base::reshape` version that too resolves above factor level issue:

``````cor_base <- round(cor(mydata), 2)
cor_base[lower.tri(cor_base)] <- NA

cor_base_df <- transform(as.data.frame(cor_base),
Var1 = factor(row.names(cor_base), levels=row.names(cor_base)))

cor_long <- subset(reshape(cor_base_df, idvar=c("Var1"),
varying = c(1:(ncol(cor_base_df)-1)), v.names="value",
timevar = "Var2",
times = factor(row.names(cor_base), levels=row.names(cor_base)),
new.row.names = 1:100,
direction = "long"), !is.na(value))

ggplot(data = cor_long, aes(Var2, Var1, fill = value)) +
geom_tile()
``````
Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download