formath formath - 3 months ago 6x
Java Question

Why need to set schema when transforming JavaRDD<Row> to DataFrame

Why does method

createDataFrame(JavaRDD<Row> javaRDD, StructType schema)
need to set a schema while
has schema in itself?

  • Row may have schema field but it is not required. o.a.s.sql.Row simply sets schema to null.
  • Schema doesn't affect the type so there is no way to enforce uniform data model. Explicit DataFrame schema serves as a single source of truth.
  • It wouldn't be possible to determine schema without reading data what would be yet another way to break laziness.