lte__ lte__ -4 years ago 135
Python Question

Pyhton - Increase read/modify/write speed?

I have a geoJSON file, which contains a breakdown of a certain geographical area into ca. 7000 cells. I'd like to a) open this geoJSON b) modify some data (see code bellow) and c) write this modified geoJSON to the disk. Now, my problem is, that since there's a lot of cells, this takes almost a minute. Do you see any way to improve the speed of this function? Thank you!

def writeGeoJSON(param1, param2, inputdf):
with open('ingeo.geojson') as f:
data = json.load(f)
for feature in data['features']:
currentfeature = inputdf[(inputdf['SId']==feature['properties']['cellId']) & (inputdf['param1']==param1) & (inputdf['param2']==param2)]
if (len(currentfeature) > 0):
feature['properties'].update({"style": {"opacity": currentfeature.Opacity.item()}})
feature['properties'].update({"style": {"opacity": 0}})
end = time.time()
with open('outgeo.geojson', 'w') as outfile:
json.dump(data, outfile)

Answer Source

There is a serial code optimization possible in your code. You have the line:

currentfeature = inputdf[(inputdf['SId']==feature['properties']['cellId']) & (inputdf['param1']==param1) & (inputdf['param2']==param2

Notice that the last two checks can be put outside the for loop. It is a redundant check which takes up many CPU clock cycles for each iteration in the for loop!!! You can modify the same as:

paramMatch=inputdf['param1']==param1 & inputdf['param2']==param2
for feature in data['features']: 
    currentfeature = inputdf[(inputdf['SId']==feature['properties']['cellId']) & paramMatch]

That must make your program run much faster!

That said, if you need better execution times(most probably not necessary), try using the multiprocessing module to parallelize the processing part of the code. You can try to split the work load in the for loop.

Try using async_appy or map to a block of iterations to speed things up!

Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download