MAPK MAPK - 3 months ago 12
R Question

How to plot three sets of comparative data in R

I have this dataframe called

mydf
where I have column
Gene_symbol
and three different columns (cancers),
AML
,
CLL
,
MDS
. I want to plot the percentage of each gene in these cancers. What would be the good way to represent this in plot?

mydf <- structure(list(GENE_SYMBOL = c("NPM1", "DNMT3A", "TET2", "IDH1",
"IDH2"), AML = c("28.00%", "24.00%", "8.00%", "9.00%", "10.00%"
), CLL = c("0.00%", "8.00%", "0.00%", "3.00%", "1.00%"), MDS = c("7.00%",
"28.00%", "7.00%", "10.00%", "3.00%")), .Names = c("GENE_SYMBOL",
"AML", "CLL", "MDS"), row.names = c(NA, 5L), class = "data.frame")

Answer

We can try with barplot from base R after removing the % from the percent columns by looping through the columns, using sub to remove the %, and converting to numeric.

mydf[-1] <- lapply(mydf[-1], function(x) as.numeric(sub("[%]", "", x)) )
barplot(`row.names<-`(as.matrix(mydf[-1]), mydf$GENE_SYMBOL), beside=TRUE,
            legend = TRUE, col = c("red", "green", "blue", "yellow"))

If we want 'GENE_SYMBOL' in the x-axis

barplot(t(`row.names<-`(mydf[-1], mydf$GENE_SYMBOL)), beside=TRUE, 
              legend = TRUE, col = c("red", "green", "blue"))

If we are using ggplot

library(dplyr)
library(tidyr)
library(ggplot2)
gather(mydf, Var, Val, -GENE_SYMBOL) %>% 
     mutate(Val = as.numeric(sub("[%]", "", Val))) %>% 
     ggplot(., aes(x= GENE_SYMBOL, y = Val)) + 
                    geom_bar(aes(fill = Var), position = "dodge", stat="identity")

enter image description here