I have two dataframes with identical column names and dtypes, similar to the following:
for column in df:
if df[column].dtype.name == "category" and cdf[column].dtype.name == "category":
union_categoricals([cdf[column], df[column]], ignore_order=True)
cdf = pd.concat([cdf,df])
I don't think this is completely obvious from the documentation, but you could do something like the following. Here's some sample data:
union_categoricals1 to get consistent categories accros dataframes. Try
df.x.cat.codes if you need to convince yourself that this works.
from pandas.api.types import union_categoricals uc = union_categoricals([df1.x,df2.x]) df1.x = pd.Categorical( df1.x, categories=uc.categories ) df2.x = pd.Categorical( df2.x, categories=uc.categories )
Concatenate and verify the dtype is categorical.
df3 = pd.concat([df1,df2]) df3.x.dtypes category
As @C8H10N4O2 suggests, you could also just coerce from objects back to categoricals after concatenating. Honestly, for smaller datasets I think that's the best way to do it just because it's simpler. But for larger dataframes, using
union_categoricals should be much more memory efficient.