 MAPK -4 years ago 123
R Question

# How to plot three sets of comparative data in R

I have this dataframe called

`mydf`
where I have column
`Gene_symbol`
and three different columns (cancers),
`AML`
,
`CLL`
,
`MDS`
. I want to plot the percentage of each gene in these cancers. What would be the good way to represent this in plot?

``````mydf <- structure(list(GENE_SYMBOL = c("NPM1", "DNMT3A", "TET2", "IDH1",
"IDH2"), AML = c("28.00%", "24.00%", "8.00%", "9.00%", "10.00%"
), CLL = c("0.00%", "8.00%", "0.00%", "3.00%", "1.00%"), MDS = c("7.00%",
"28.00%", "7.00%", "10.00%", "3.00%")), .Names = c("GENE_SYMBOL",
"AML", "CLL", "MDS"), row.names = c(NA, 5L), class = "data.frame")
`````` akrun
Answer Source

We can try with `barplot` from `base R` after removing the `%` from the percent columns by looping through the columns, using `sub` to remove the `%`, and converting to `numeric`.

``````mydf[-1] <- lapply(mydf[-1], function(x) as.numeric(sub("[%]", "", x)) )
barplot(`row.names<-`(as.matrix(mydf[-1]), mydf\$GENE_SYMBOL), beside=TRUE,
legend = TRUE, col = c("red", "green", "blue", "yellow"))
``````

If we want 'GENE_SYMBOL' in the x-axis

``````barplot(t(`row.names<-`(mydf[-1], mydf\$GENE_SYMBOL)), beside=TRUE,
legend = TRUE, col = c("red", "green", "blue"))
``````

If we are using `ggplot`

``````library(dplyr)
library(tidyr)
library(ggplot2)
gather(mydf, Var, Val, -GENE_SYMBOL) %>%
mutate(Val = as.numeric(sub("[%]", "", Val))) %>%
ggplot(., aes(x= GENE_SYMBOL, y = Val)) +
geom_bar(aes(fill = Var), position = "dodge", stat="identity")
`````` Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download
Latest added