MFR MFR - 8 days ago 4
R Question

Joining two datasets with different classes

I'm struggling with joining two data sets


id name1
1 a
2 b
3 c



id name2
1 c
2 d

I try to join them by their


result <- left_join(df1, df2, by="id")

it gives me the following error

Error: cannot join on columns 'id' x 'id':
Can't join on 'id' x 'id' because of
incompatible types (factor / integer)

because they have different classes:

sapply(df1, class)
id name1
"factor" "factor"

sapply(df2, class)
id name2
"integer" "factor"

I tried to change the classes to make them similar

df1$id <- as.integer (df1$id)

but , it doesn't help to find the common rows in two datasets.
( it can not recognize similar "id"s in df2)


From help page: as.numeric(levels(f))[f] is recommended instead of as.numeric(as.character(f)).

The issue with factor => numeric/integer conversion has been comprehensively answered by @Joshua Ulrich here.

Seek and ye shall find but user needs to know what to look for to reach the answer.

The Warning message in documentation for ?factor

The interpretation of a factor depends on both the codes and the "levels" attribute. Be careful only to compare factors with the same set of levels (in the same order). In particular, as.numeric applied to a factor is meaningless, and may happen by implicit coercion. To transform a factor f to approximately its original numeric values, as.numeric(levels(f))[f] is recommended and slightly more efficient than as.numeric(as.character(f)).

This step could be avoided by ensuring stringsAsFactors=FALSE while reading input data to side-step conversion of character variables to factors unless they are absolutely essential i.e. when levels of factors are required in analysis.