I have a dataframe (df1) which has 50 columns, the first one is a cust_id and the rest are features. I also have another dataframe (df2) which contains only cust_id. I'd like to add one records per customer in df2 to df1 with all the features as 0. But as the two dataframe have two different schema, I cannot do a union. What is the best way to do that?
I use a full outer join but it generates two cust_id columns and I need one. I should somehow merge these two cust_id columns but don't know how.
You can try to achieve something like that by doing a full outer join like the following:
val result = df1.join(df2, Seq("cust_id"), "full_outer")
However, the features are going to be
null instead of 0. If you really need them to be zero, one way to do it would be:
val features = df1.columns.toSet - "cust_id" // Remove "cust_id" column val newDF = features.foldLeft(df2)( (df, colName) => df.withColumn(colName, lit(0)) ) df1.unionAll(newDF)