How can I save a PySpark DataFrame to a real JSON file?
Following documentation, I have tried
d = json.load(open('myfile.json'))
It there a way to do it? Not really, or at least not in an elegant way. You could convert data to Python RDD, compute partition statistics, and build complete document manually but it looks like a waste of time.
If you want to get a
dicts just parse files(-s) line by line:
with open('myfile.json') as fr: dicts = [json.loads(line) for line in fr]