hdy hdy - 5 months ago 48
Python Question

Load multiple files into dataframe

Is it possible to load multiple files as one dataframe? Normally, if I have one file to load, I will call for example:

file1 = "/a/b/c/folder/file1.csv"
dc = sqlContext.read.format('com.databricks.spark.csv').options(header='false', inferschema='true').load(file1)


But I want to load all files under the folder
/a/b/c/folder/*.csv
.

hdy hdy
Answer

I think sqlContext.read.format('com.databricks.spark.csv').options(header='false', inferschema='true').load(folder) works. Previously I got error is because I am ready compressed files, and they are oversized compared with the memory