Jerry.Shad Jerry.Shad - 1 year ago 106
R Question

How to permute list of data.frame and create its annotated stack bar plot in ggplot2?

I have list of data.frame that needed to be permuted first and create annotated stack bar plot for each data.frame. I have searched related post in SO and got some idea how to do it. However, I gave my shot how to make this as I expected, but my code is slow if data.frame is rather big, while getting stack bar plot is not desired. So I am continously reading ggplot2' vignette to get solution. I am stucked with permuting data.frame list in desired way. How can I permute list of data.frame and create its annotated stack bar plot ? Can anyone give me idea how to do this easily and efficiently ? How to manipulate list of data.frame and get annotated stacked bar plot(number of observation, label) ? Thanks in advance

reproducible data.frame :

confirmedDF <- list(
bar = data.frame(begin=seq(2, by=11, len=25), end=seq(8, by=11, len=25), score=sample(54,25)),
cat = data.frame(begin=seq(5, by=8, len=35), end=seq(9, by=8, len=35), score=sample(45,35)),
foo = data.frame(begin=seq(8, by=13, len=25), end=seq(17, by=13, len=25), score=sample(49,25))

discardedDF <- list(
bar = data.frame(begin=seq(3, by=12, len=40), end=seq(8, by=12, len=40), score=sample(72,40)),
cat = data.frame(begin=seq(9, by=15, len=50), end=seq(17, by=15, len=50), score=sample(60,50)),
foo = data.frame(begin=seq(21, by=19, len=30), end=seq(32, by=19, len=30), score=sample(42,30))

then my input list of data.frame :


names(confirmedDF) <- paste("confirmed", names(confirmedDF), sep = ".")
names(discardedDF) <- paste("discarded", names(discardedDF), sep = ".")
merged <-, c(confirmedDF, discardedDF))
merged %<>% rownames_to_column(var = "cn")
merged %<>% separate(cn, c("original_list", "letters", "seq"), sep = "\\.")
merged %<>% mutate(stringency = ifelse(score >= 12, "Stringent", "Weak"))

res <- merged %>% split(list(.$letters, .$stringency, .$original_list))

my attempt to get each individual stack bar plot, trivial code as follows :


lapply(res, function(ele_) {
plot_data <- ele_ %>%
group_by(sample, stringency, list) %>%
tally %>%
group_by(sample, stringency) %>%
mutate(percentage = n / sum(n), cumsum = cumsum(percentage))

ggplot(data = plot_data, aes(x = sample, y= n ,fill = stringency)) +
geom_bar(position = "dodge",stat = "identity")

I don't understand using lapply to get each bar plot is quite slow and inefficient. above trivial code didn't give my desired bar plot. How can I optimize the code? How to permute list of data.frame and getting its annotated bar plot ?

This is what I want to achieve for each sample :
enter image description here

How can I achieve my desired output stack bar plot ? Any idea ?

Answer Source

You could try this:

res %>% 
  bind_rows %>% 
  group_by(stringency, list, sample) %>% 
  tally %>% 
  ungroup %>% 
  setNames(c("var", "val", "sample", "n")) %>% 
  {bind_rows(., setNames(., c("val", "var", "sample", "n")))} %>% 
  ggplot(aes(x=var, y=n, fill=val)) + 
  geom_col() + 
  geom_text(aes(label=n), position=position_stack(vjust = 0.5)) + 

enter image description here

Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download