Jerry.Shad Jerry.Shad - 12 days ago 6
R Question

How to correctly annotate stack bar plot with actual number of observations from csv files?

I have implemented function that accept list of data.frame as an input, then filter out by threshold value. Now I can export filtered result as csv files. To better understand output where how many observation in each, getting annotated stack bar plot could be good options. How can I get annotated bar plot for list of csv files ? Can anyone give me possible ideas to achieve my desired output? How to manipulate csv files for getting stack bar plot ? Any idea ? Thanks a lot

reproducible data :

output <- list(
bar = data.frame(begin=seq(2, by=14, len=45), end=seq(9, by=14, len=45), score=sample(60,45)),
cat = data.frame(begin=seq(5, by=21, len=36), end=seq(13, by=21, len=36), score=sample(75,36)),
foo = data.frame(begin=seq(8, by=18, len=52), end=seq(15, by=18, len=52), score=sample(100,52))
)


I implemented this function to filter the input list by threshold:

myFunc <- function(mList, threshold) {
# check input param
stopifnot(is.numeric(threshold))
res <- lapply(mList, function(elm) {
split(elm, ifelse(elm$score >= threshold, "saved", "droped"))
})
rslt <- lapply(names(res), function(elm) {
mapply(write.csv,
res[[elm]],
paste0(elm, ".", names(res[[elm]]), ".csv"))
})
return(rslt)
}

#' @example
myFunc(output, 10)


Now I got list of csv files, I intend to get annotated stack bar plot for each file bar with actual number of observation. How can I make this happen efficiently ?

This is the mockups of desired plot :

enter image description here

Answer

Original Answer (pre-edit / comments):

d   <- dir()[grepl("\\.droped", dir())]
s   <- dir()[grepl("\\.saved", dir())]
dropped <- as.numeric()
for(i in d){
  dropped <- c(dropped,nrow(read.csv(i)))
}
saved <- as.numeric()
for(i in s){
  saved <- c(saved,nrow(read.csv(i)))
}
tmp1 <- cbind(dropped,saved)

# Stacked Bar Plot with Colors and Legend    
barplot(tmp1, main="CSV File Row Counts",
        xlab="Number of Obs.", col=c("darkblue","red", "green"),
        legend = c("cat", "bar", "foo"))

enter image description here

Modified Answer (post-edit):

Based on the comments/edit I have revised the plot to include labels inside of the segments:

require(ggplot2)
Data      <- data.frame(obs    = c(tmp,tmp0),
                        # could get name from "output" to make it programmatic:
                        name   = c("cat", "foo", "bar"), 
                        filter = c(rep("Dropped",length(dropped)),
                                      rep("Saved", length(saved)))
)

ggplot(Data, aes(x = filter, y = obs, fill = name, label = obs)) +
  geom_bar(stat = "identity") +
  geom_text(size = 3, position = position_stack(vjust = 0.5))

enter image description here