hdy hdy - 1 year ago 106
Python Question

Load multiple files into dataframe

Is it possible to load multiple files as one dataframe? Normally, if I have one file to load, I will call for example:

file1 = "/a/b/c/folder/file1.csv"
dc = sqlContext.read.format('com.databricks.spark.csv').options(header='false', inferschema='true').load(file1)

But I want to load all files under the folder

hdy hdy
Answer Source

I think sqlContext.read.format('com.databricks.spark.csv').options(header='false', inferschema='true').load(folder) works. Previously I got error is because I am ready compressed files, and they are oversized compared with the memory