echo echo - 6 months ago 195
Python Question

Spark unionAll multiple dataframes

For a set of dataframes

val df1 = sc.parallelize(1 to 4).map(i => (i,i*10)).toDF("id","x")
val df2 = sc.parallelize(1 to 4).map(i => (i,i*100)).toDF("id","y")
val df3 = sc.parallelize(1 to 4).map(i => (i,i*1000)).toDF("id","z")


to union all of them I do

df1.unionAll(df2).unionAll(df3)


Is there a more elegant and scalable way of doing this for any number of dataframes, for example from

Seq(df1, df2, df3)

Answer

The simplest solution is to reduce with unionAll:

val dfs = Seq(df1, df2, df3)
dfs.reduce(_ unionAll _)
Comments