Viet-Thi Tran Viet-Thi Tran - 2 months ago 15
R Question

R ggplot Coincidence plot

I'm working on a database of patients with multiple conditions and am trying to create a graphic showing associations between these conditions. More specifically, i'd like to obtain something like this ("coincidence" plot)
My data is organized as:

mal1 mal2 mal3 etc.
0 0 1
1 1 0
0 1 0 etc.


I create the data as I want it to be shown using the following code:

X <- as.matrix(hdat2)
out <- crossprod(X)
diag(out) <- 0


And i create the plot with:

out<- melt(out)
out$value[which(out$value==0)]<-NA
g <- ggplot(data.frame(out), aes(Var1, Var2)) + geom_point(aes(size = value), colour = "black") + theme_bw() + xlab("") + ylab("")
g + scale_size_continuous(range=c(2,10))+


As a result i obtain this plot.

I'd like to hide the symetric half of the plot, which i think is misleading (similarly as how, i correlation matrices i can hide the symetric half). However, i'm not sure about how to do it.

Could anyone help ?
Thanks

Answer

First, some reproducible data:

mat <-
  data.frame(
    malA = sample(0:1, 100, TRUE, c(0.2,0.8))
    , malB = sample(0:1, 100, TRUE, c(0.3,0.7))
    , malC = sample(0:1, 100, TRUE, c(0.4,0.6))
    , malD = sample(0:1, 100, TRUE, c(0.5,0.5))
  )

out <- crossprod(as.matrix(mat))   
diag(out) <- 0

Here is an example limiting down to just the half you are interested in using dplyr:

toPlotHalf <-
  melt(out) %>%
  mutate_each(funs(factor(.))
              , starts_with("Var")) %>%
  filter(as.numeric(Var1) < as.numeric(Var2))

ggplot(toPlotHalf
       , aes(Var1, Var2)) +
  geom_point(aes(size = value), colour = "black") +
  theme_bw() + xlab("") + ylab("") +
  scale_size_continuous(range=c(2,10))

enter image description here

Note, however, that in this way your plot is going to be dominated by particular maladies that are very common. Alternatively, you can present the percentage of people with each malady that have the other malady (note that now the reciprocal points are not (necessarily) the same size:

toPlot <-
  prop.table(out, 1) %>%
  melt() %>%
  filter(value > 0)



ggplot(toPlot
       , aes(Var1, Var2)) +
  geom_point(aes(size = value), colour = "black") +
  theme_bw() + xlab("") + ylab("") +
  scale_size_continuous(range=c(2,10))

enter image description here