jaySf jaySf - 1 month ago 18
R Question

How can I ggplot a merged data frame composed of factors with an alphabetical sorted x-axis?

I have two data frames of nations A and B, some are matching. With

rbind()
and
dplyr::summarise()
I generate a new data frame with the counts of each country. In order to
ggplot()
with alphabetically sorted x-axis I sort the data frame by country with
order()
and even drop the rownames. Why are in the resulting plot some of the merged countries still appearing at the end of the x-axis and not in the desired alphabetic sorting I've made? (BTW isn't the order of a data frame alphabetic by default, even when it is merged?) Thanks for your help.

# Group A
ctry <- factor(c("ALB", "ALB", "ALB", "ALB", "BEL", "BIH", "CHE", "CHE", "CHE", "CHE", "CHE", "CHE", "CHE",
"DEU", "DEU", "ITA", "KOS", "KOS", "KOS", "SVK", "TUR", "TUR"))
df01 <- data.frame(ctry)

rm(ctry)

# Group B
ctry <- factor(c("ECU", "GHA", "CHE", "JAM", "ITA", "KOS", "TUR", "DOM"))
df02 <- data.frame(ctry)

# Group joined
df <- rbind(df01, df02)

# Countries Counts

library(dplyr)

df.sum <- df %>%
group_by(ctry) %>%
summarise(num=n()) %>%
as.data.frame()

# alphabetic sorting

df.sum <- df.sum [order(df.sum [1]),]
rownames(df.sum ) <- NULL

df.sum

# ctry num # Here it's alphabetically sorted...
# 1 ALB 4
# 2 BEL 1
# 3 BIH 1
# 4 CHE 8
# 5 DEU 2
# 6 DOM 1
# 7 ECU 1
# 8 GHA 1
# 9 ITA 2
# 10 JAM 1
# 11 KOS 4
# 12 SVK 1
# 13 TUR 3

# ggplot # ...but not in the ggplot. -->

library(ggplot2)

ggplot(df.sum , aes(ctry)) + geom_bar(aes(weight = num, fill = ctry)) +
scale_fill_discrete(name="Countries")


enter image description here

Answer

Factors are just numbers with labels, so they may not combine like you think they should. With rbind and c you get df01 countries first and then anything new in df02.

unclass(ctry)
[1] 3 4 1 6 5 7 8 2
attr(,"levels")
[1] "CHE" "DOM" "ECU" "GHA" "ITA" "JAM" "KOS" "TUR"

I would use dplyr::bind_rows instead of rbind and that will tell you one problem and try to correct it.

df2 <- bind_rows(df01, df02)
 Warning message:
 In bind_rows_(x, .id) : Unequal factor levels: coercing to character

Then ctry is just a character string and will sort like you want

  df.sum <- df2 %>%
  group_by(ctry) %>%
  summarise(num=n())

  ggplot...
Comments