Mpondomise Mpondomise - 29 days ago 10
Python Question

python convert one json structure to a nested structure

how can I convert the following json format to the target format below? I have 50 thousand entries.

Basically, get the unique country from each array and include all other with the same country name under one array.

original json:

[
{
"unilist": [
{
"country": "United States",
"name": "The College of New Jersey",
"web_page": "http://www.tcnj.edu"
},
{
"country": "United States",
"name": "Abilene Christian University",
"web_page": "http://www.acu.edu/"
},
{
"country": "United States",
"name": "Adelphi University",
"web_page": "http://www.adelphi.edu/"
},
{
"country": "China",
"name": "Harbin Medical University",
"web_page": "http://www.hrbmu.edu.cn/"
},
{
"country": "China",
"name": "Harbin Normal University",
"web_page": "http://www.hrbnu.edu.cn/"
}
...
]
}
]


target format:

{
"unilist" : {
"United States" : [
{"name" : "The College of New Jersey", "web_page" : "http://www.tcnj.edu"},
{"name" : "Abilene Christian University", "web_page" : "http://www.acu.edu/"},
{"name" : "Adelphi University", "web_page" : "http://www.adelphi.edu/"}
],
"China" : [
{"name" : "Harbin Medical University", "web_page" : "http://www.hrbnu.edu.cn/"}
],
...
}
}


update



my attempt (in Python 2.7.11) based on the answer provided by downshift, however it is not working as expected, I get the following typeError:

from collections import defaultdict
import json
from pprint import pprint

with open('old_list.json') as orig_json:
newlist = defaultdict(list)

for country in orig_json[0]['unilist']:
newlist[country['country']].append({'name': country['name'], 'web_page': country['web_page']})

with open('new_list.json', 'w') as fp:
json.dump(newlist,fp)


pprint.pprint(dict(newlist))




TypeError:

Traceback (most recent call last):
File "convert.py", line 8, in <module>
for country in orig_json[0]['unilist']:
TypeError: 'file' object has no attribute '__getitem__'

Answer

This produces almost the same target output, only it's missing the "unilist" key. But at least it does group entries by country:

from collections import defaultdict

newlist = defaultdict(list)

for country in orig_json[0]['unilist']:
    newlist[country['country']].append({'name': country['name'], 'web_page': country['web_page']})

import pprint
pprint.pprint(dict(newlist))

output:

{'China': [{'name': 'Harbin Medical University',
            'web_page': 'http://www.hrbmu.edu.cn/'},
           {'name': 'Harbin Normal University',
            'web_page': 'http://www.hrbnu.edu.cn/'}],
 'United States': [{'name': 'The College of New Jersey',
                    'web_page': 'http://www.tcnj.edu'},
                   {'name': 'Abilene Christian University',
                    'web_page': 'http://www.acu.edu/'},
                   {'name': 'Adelphi University',
                    'web_page': 'http://www.adelphi.edu/'}]}

I'll update this answer if I get figure out a better way to get the exact target output. In the meantime, I hope this helps you.