s.matthew.english s.matthew.english - 8 days ago 6
JSON Question

adding an incremental counter to loop in a python parsing script

At the moment I'm using the following python script:

import json
from collections import defaultdict
from pprint import pprint

with open('prettyPrint.txt') as data_file:
data = json.load(data_file)

locations = defaultdict(list)


for item in data['data']:
location = item['relationships']['location']['data']['id']
locations[location].append(item['id'])

pprint(locations)


to parse some dirty json data like so:

{
"links": {
"self": "http://localhost:2510/api/v2/jobs?skills=data%20science"
},
"data": [
{
"id": 121,
"type": "job",
"attributes": {
"title": "Data Scientist",
"date": "2014-01-22T15:25:00.000Z",
"description": "Data scientists are in increasingly high demand amongst tech companies in London. Generally a combination of business acumen and technical skills are sought. Big data experience ..."
},
"relationships": {
"location": {
"links": {
"self": "http://localhost:2510/api/v2/jobs/121/location"
},
"data": {
"type": "location",
"id": 3
}
},
"country": {
"links": {
"self": "http://localhost:2510/api/v2/jobs/121/country"
},
"data": {
"type": "country",
"id": 1
}
},


At this point the output is in this way:

85: [36026,
36028,
36032,
36027,
217897,
286398,
315064,
320879,
322303,
322608,
322611,
323199,
325659,
327652],
88: [13690,
13693,
13689,
13692,
13691,
16454,
16453,
28002,
28003,
28004,
28001,
114667,
233319,
233329,
263814,
271490,
271571,
271569,
271570,
291274,
291275,
300376,
300373,
301293,
301295,
304286,
304285,
320425,
320426,
320424,
320431,
320430,
321284,
321281,
321283,
321282,
321280,
324345,
327926,
347985,
358537,
358549,
357807,
364541,
358431,
334990,
359241],


But I'd like to change it so that the output looks like this:

...
87: 02
88: 73
89: 15
90: 104
...


I know I need to put some kind of
i=0
,
i++
into that loop somewhere- but I can't figure it out- how to do that?

Answer

You just need the count of the items in the dict, not the actual items to be part of the locations dict. Use int with defaultdict as:

locations = defaultdict(int)
# makes default value of each key as `0`

and make your for loop as:

for item in data['data']:
    location = item['relationships']['location']['data']['id']
    locations[location] += 1   # increase the count by `1`

OR, it is even better to use collections.Counter() along with generator expression, as mentioned by @TigerhawkT3:

from collections import Counter

Counter(item['relationships']['location']['data'‌​]['id'] for item in data['data'])