Learner Learner - 1 year ago 119
Scala Question

Spark :How to generate file path to read from s3 with scala

How do I generate and load multiple s3 file path in scala so that I can use :

sqlContext.read.json ("s3://..../*/*/*")

I know I can use wildcards to read multiple files but is there any way so that I can generate the path ? For example my fIle structure looks like this:


These files are all jsons. The issue is I need to load just spacific duration of files, for eg. Say 16 days then I need to loado files for start day ( oct 16) : oct 1 to 16.

With 28 day duration for same start day I would like to read from Sep 18

Can some tell me any ways to do this ?

p2. p2.
Answer Source

You can take a look at this answer, You can specify whole directories, use wildcards and even CSV of directories and wildcards. E.g.:


Or you can use AWS API to get the list of files locations and read those files using spark .

You can look into this answer to AWS S3 file search.