Daniel Harris Daniel Harris - 3 months ago 8
R Question

Order barplots in R based on fill value

This problem has been brought up a million times on stacko but I couldn't seem to find a solution that tailored to my particular problem.

I have a data frame which includes a column of species and a column of genome_names:

species genome_name
Acinetobacter baumannii Acinetobacter baumanii BIDMC 56
Acinetobacter baumannii Acinetobacter baumannii 1032359
Klebsiella pneumoniae Klebsiella pneumoniae CHS 30
etc...


Using this code I created a barplot of species with a height of genome_name:

library(ggplot2)
ggplot(PATRIC_genomes_AMR_2_ris_subset,aes(x=species,fill=genome_name)) +
geom_bar(colour="black") + scale_colour_continuous(guide = FALSE) +
labs(title="Number of unique strains") +
labs(x = "Species",y="#Strains") + theme(legend.position="none") +
theme(axis.text.x=element_text(angle=90,hjust=1,vjust=0.5))


I would like to order this barplot in increasing value of y (number of genome_name). I blindly attempted to do this by putting my data in a factor to no avail:

Error in `[<-.data.frame`(`*tmp*`, del, value = NULL) :
missing values are not allowed in subscripted assignments of data frames

Answer
library(ggplot2)
PATRIC_genomes_AMR_2_ris_subset <- read.csv("genomes_subset.csv", header = T)
PATRIC_genomes_AMR_2_ris_subset <- dplyr::sample_n(PATRIC_genomes_AMR_2_ris_subset, 300)

PATRIC_genomes_AMR_2_ris_subset <- PATRIC_genomes_AMR_2_ris_subset[order(PATRIC_genomes_AMR_2_ris_subset$species),]


# Order by genome_name
PATRIC_genomes_AMR_2_ris_subset <- within(PATRIC_genomes_AMR_2_ris_subset, 
                   Position     <- factor(genome_name, 
                                      levels=names(sort(table(genome_name), 
                                                        decreasing=TRUE))))

enter image description here

ggplot(PATRIC_genomes_AMR_2_ris_subset,aes(x=species,fill=genome_name)) + 
  geom_bar(colour="black") + scale_colour_continuous(guide = FALSE) + 
  labs(title="Number of unique strains") +
  labs(x = "Species",y="#Strains") + theme(legend.position="none") + 
  theme(axis.text.x=element_text(angle=90,hjust=1,vjust=0.5)) 

# Order by species
PATRIC_genomes_AMR_2_ris_subset <- within(PATRIC_genomes_AMR_2_ris_subset, 
                                          species <- factor(species, 
                                                         levels=names(sort(table(species), 
                                                         decreasing=TRUE))))

ggplot(PATRIC_genomes_AMR_2_ris_subset,aes(x=species,fill=genome_name)) + 
  geom_bar(colour="black") + scale_colour_continuous(guide = FALSE) + 
  labs(title="Number of unique strains") +
  labs(x = "Species",y="#Strains") + theme(legend.position="none") + 
  theme(axis.text.x=element_text(angle=90,hjust=1,vjust=0.5)) 

enter image description here

This is pretty much the same as this but with yours you mentioned ordering it by the fill value, genome_name, which is a little different and we also got to see how the ordering affects the run time, so it's not a duplicate.

Comments