Christopher Yee Christopher Yee - 3 months ago 118
R Question

dplyr join warning: joining factors with different levels

When using the join function in the

dplyr
package, I get this warning:

Warning message:
In left_join_impl(x, y, by$x, by$y) :
joining factors with different levels, coercing to character vector


There is not a lot of information online about this. Any idea what it could be? Thanks!

Answer

That's not an error, that's a warning. And it's telling you that one of the columns you used in your join was a factor and that factor had different levels in the different datasets. In order not to loose any information, the factors were converted to character values. For example

library(dplyr)
x<-data.frame(a=letters[1:7])
y<-data.frame(a=letters[4:10])

class(x$a) 
# [1] "factor"

# NOTE these are different
levels(x$a)
# [1] "a" "b" "c" "d" "e" "f" "g"
levels(y$a)
# [1] "d" "e" "f" "g" "h" "i" "j"

m <- left_join(x,y)
# Joining by: "a"
# Warning message:
# joining factors with different levels, coercing to character vector 

class(m$a)
# [1] "character"

You can make sure that both factors have the same levels before merging

combined <- sort(union(levels(x$a), levels(y$a)))
n <- left_join(mutate(x, a=factor(a, levels=combined)),
    mutate(y, a=factor(a, levels=combined)))
# Joining by: "a"
class(n$a)
#[1] "factor"
Comments