echo echo - 1 year ago 534
Python Question

Spark unionAll multiple dataframes

For a set of dataframes

val df1 = sc.parallelize(1 to 4).map(i => (i,i*10)).toDF("id","x")
val df2 = sc.parallelize(1 to 4).map(i => (i,i*100)).toDF("id","y")
val df3 = sc.parallelize(1 to 4).map(i => (i,i*1000)).toDF("id","z")

to union all of them I do


Is there a more elegant and scalable way of doing this for any number of dataframes, for example from

Seq(df1, df2, df3)

Answer Source

The simplest solution is to reduce with unionAll:

val dfs = Seq(df1, df2, df3)
dfs.reduce(_ unionAll _)
Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download