Mohit Bansal Mohit Bansal - 2 months ago 31
R Question

SparkR reading and writing dataframe issue

I have a Spark DataFrame which I want to write to my disc, I used the following code-

write.df(data_frame,"dataframe_temp.csv",source="csv",mode="overwrite",schema="true",header="true")


It got completed and I can see a new folder created with a
_SUCCESS
file in it.

Now when I am trying to read from the same file, using following code-

dataframe2<-read.df("dataframe_temp.csv",inferSchema="true",header="true")


I am getting following error:


ERROR RBackendHandler: loadDF on org.apache.spark.sql.api.r.SQLUtils
failed Error in invokeJava(isStatic = TRUE, className, methodName,
...) : org.apache.spark.sql.AnalysisException: Unable to infer
schema for ParquetFormat at dataframe.csv. It must be specified
manually;


I have even tried using repartition

data_frame<-repartition(data_frame,1)


Any help?

Answer

You also have to specify the source as "csv":

dataframe2<-read.df("dataframe_temp.csv", source="csv")

Regarding the header argument:

Currently there is also a bug in SparkR for Spark 2.0, where the variable arguments of the write.df function aren't passed to the options parameter (see https://issues.apache.org/jira/browse/SPARK-17442). That's why the header is not written to the csv even if you specify header="true" on write.df.

Comments