Ninjaboy12 Ninjaboy12 - 2 months ago 18
Python Question

Split large JSON file into batches of 100 at a time to run through an API

I am so close to being done with this tool I am developing, but as a junior developer with NO senior programmer to work with I am stuck. I have a script in python that takes data from our data base converts it to JSON to be run through an Address validation API, I have it all working, but the fact is that the API only accepts 100 objects at a time. I need to basically break up the file with X objects into batches of 100 to be run then stored into the same output file. Here is the snippit of my script structure:

for row in rows:

d = collections.OrderedDict()
d['input_id'] = str(row.INPUT_ID)
d['addressee'] = row.NAME
d['street'] = row.ADDRESS
d['city'] = row.CITY
d['state'] = row.STATE
d['zipcode'] = row.ZIP
d['candidates'] = row.CANDIDATES
obs_list.append(d)

json.dump(obs_list, file)

ids_file = '.csv'

cur.execute(input_ids)

columns = [i[0] for i in cur.description]

ids_input = cur.fetchall()

#ids_csv = csv.writer(with open('.csv','w',newline=''))

with open('.csv','w',newline='') as f:

ids_csv = csv.writer(f,delimiter=',')

ids_csv.writerow(columns)

ids_csv.writerows(ids_input)

print('Run through API')

url = 'https://api.'

headers = {'content-type': 'application/json'}


this is where i assume i need to do the loop to break it up

with open('.json', 'r') as run:

dict_run = run.readlines()

dict_ready = (''.join(dict_run))

#lost :(
for object in dict_ready:

# do something with object to only run 100 at a time

r = requests.post(url, data=dict_ready, headers=headers)

ss_output = r.text

output = 'C:\\Users\\TurnerC1\\Desktop\\ss_output.json'

with open(output,'w') as of:

of.write(ss_output)


at the moment I have about 4,000 of these in a file to be run through the API that only accepts 100 at a time. Im sure there is an easy answer, I am just burnt out doing this by myself lol. Any help is greatly appreciated.

sample json:

[
{
"street":"1 Santa Claus",
"city":"North Pole",
"state":"AK",
"candidates":10
},
{
"addressee":"Apple Inc",
"street":"1 infinite loop",
"city":"cupertino",
"state":"CA",
"zipcode":"95014",
"candidates":10
}
]'

Answer

try this as your second chunk of code

ss_output=[]
with open('.json', 'r') as run:
    dict_run = json.loads(run)
    for i in range(0,len(dict_ready)-100,100):
        # do something with object to only run 100 at a time
        dict_ready=json.dumps(dict_run[i:i+100])
        r = requests.post(url, data=dict_ready, headers=headers)
        ss_output.extend(r.json())

output = 'C:\\Users\\TurnerC1\\Desktop\\ss_output.json'

with open(output,'w') as of:

    json.dump(ss_output,of)