I want to use datasets instead of dataframes.
I'm reading a parquet file and want to infer the types directly:
val df: Dataset[Row] = spark.read.parquet(path)
val df= spark.read.parquet(path).as[myCaseClass]
Why do you want to work with a
Dataset? I think it's because you will have not only the schema for free (which you have with the result
DataFrame anyway) but because you will have a type-safe schema.
You need to have an
Encoder for your dataset and to have it you need a type that would represent your dataset and hence the schema.
select your columns to a reasonable number and use
as[MyCaseClass] or you should accept what