Stéphane Soulier Stéphane Soulier - 2 months ago 24
JSON Question

How to read all files in a directory with spark_read_json from sparklyr

I have json events stored locally (for debug) with this structure :

events/year/month/day/hour/somefiles.log
. Each
file.log
is file with on each line a json object (my event).

How I can load this files recursively with
spark_read_json
from the package sparklyr.
I tried :

library(sparklyr)

sc = spark_connect(master = "local")
events = spark_read_json(sc = sc, name = "events", path = "events/*")


but without success.

Edit 1



In fact it works at a certain level in the path for example

events = spark_read_json(sc = sc, name = "events", path = "events/year/month/day/*")
works but

events = spark_read_json(sc = sc, name = "events", path = "events/year/month/*"
doesn't work

Answer

You may need to specify the depth of the path search with multiple wildcards. Try:

events = spark_read_json(sc = sc, name = "events", path = "events/year/month/*/*")
Comments