martina martina - 1 year ago 553
JSON Question

PySpark save DataFrame to actual JSON file

How can I save a PySpark DataFrame to a real JSON file?

Following documentation, I have tried


It works, but it saves the file as a series of dictionaries, one per line and this does not get read properly by a

import json
d = json.load(open('myfile.json'))

I would like the file to contain a list of dictionaries. Is there a way?

Answer Source

It there a way to do it? Not really, or at least not in an elegant way. You could convert data to Python RDD, compute partition statistics, and build complete document manually but it looks like a waste of time.

If you want to get a list of dicts just parse files(-s) line by line:

with open('myfile.json') as fr:
    dicts = [json.loads(line) for line in fr]
Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download