mpen mpen - 8 months ago 71
JSON Question

Split 95mb JSON array into smaller chunks?

I exported some data from my database in the form of JSON, which is essentially just 1 [list] with a bunch (900K) of {objects} inside it.

Trying to import it on my production server now, but I've got some cheap web server. They don't like it when I eat all their resources for 10 minutes.

How can I split this file into smaller chunks so that I can import it piece by piece?




Edit: Actually, it's a PostgreSQL database. I'm open to other suggestions on how I can export all the data in chunks. I've got phpPgAdmin installed on my server, which supposedly can accept CSV, Tabbed and XML formats.




I had to fix phihag's script:

import json
with open('fixtures/PostalCodes.json','r') as infile:
o = json.load(infile)
chunkSize = 50000
for i in xrange(0, len(o), chunkSize):
with open('fixtures/postalcodes_' + ('%02d' % (i//chunkSize)) + '.json','w') as outfile:
json.dump(o[i:i+chunkSize], outfile)





dump:

pg_dump -U username -t table database > filename


restore:

psql -U username < filename


(I don't know what the heck pg_restore does, but it gives me errors)

The tutorials on this conveniently leave this information out, esp. the
-U
option which is probably necessary in most circumstances. Yes, the man pages explain this, but it's always a pain to sift through 50 options you don't care about.




I ended up going with Kenny's suggestion... although it was still a major pain. I had to dump the table to a file, compress it, upload it, extract it, then I tried to import it, but the data was slightly different on production and there were some missing foreign keys (postalcodes are attached to cities). Of course, I couldn't just import the new cities, because then it throws a duplicate key error instead of silently ignoring it, which would have been nice. So I had to empty that table, repeat the process for cities, only to realize something else was tied to cities, so I had to empty that table too. Got the cities back in, then finally I could import my postal codes. By now I've obliterated half my database because everything is tied to everything and I've had to recreate all the entries. Lovely. Good thing I haven't launched the site yet. Also "emptying" or truncating a table doesn't seem to reset the sequences/autoincrements, which I'd like, because there are a couple magic entries I want to have ID 1. So..I'd have to delete or reset those too (I don't know how), so I manually edited the PKs for those back to 1.

I would have ran into similar problems with phihag's solution, plus I would have had to import 17 files one at a time, unless I wrote another import script to match the export script. Although he did answer my question literally, so thanks.

Answer Source

Assuming you have the option to go back and export the data again...:

pg_dump - extract a PostgreSQL database into a script file or other archive file.

pg_restore - restore a PostgreSQL database from an archive file created by pg_dump.

If that's no use, it might be useful to know what you're going to be doing with the output so that another suggestion can hit the mark.

Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download