Sridhar-Sarnobat Sridhar-Sarnobat - 3 months ago 16
Python Question

How to pipe multi-line JSON Objects into separate python invocations

I know the basics of piping stdin to downstream processes in the shell and as long as each line is treated individually, or as one single input, I can get my pipelines to work.

But when I want to read 4 lines of stdin, do some processing, read 6 more lines, and do the same, my limited of understanding of pipelines becomes an issue.

For example, in the below pipeline, each curl invocation produces an unknown number of lines of output that constitute one JSONObject:

cat geocodes.txt \
| xargs -I% -n 1 curl -s 'http://maps.googleapis.com/maps/api/geocode/json?latlng='%'&sensor=true' \
| python -c "import json,sys;obj=json.load(sys.stdin);print obj['results'][0]['address_components'][3]['short_name'];"


How can I consume exactly one JSONObject per
python
invocation? Note I actually have negligible experience in Python. I actually have more experience with
Node.js
(would it be better to use Node.js to process the JSON curl output?)

Geocodes.txt would be something like:

51.5035705555556,-3.15153263888889
51.5035400277778,-3.15153477777778
51.5035285833333,-3.15150258333333
51.5033861111111,-3.15140833333333
51.5034980555556,-3.15146016666667
51.5035285833333,-3.15155505555556
51.5035362222222,-3.15156338888889
51.5035362222222,-3.15156338888889


EDIT
I have a nasty feeling that the answer is that you need to read line by line and check whether you have a complete object before parsing. Is there a function which will do the hard work for me?

EDIT 2
This might help, but if the below answer does what I want I won't pursue it for now. JSON grouping:
http://trentm.com/json/#FEATURE-Grouping

Answer

I believe this approach would accomplish what you want. First, save your python script in a file, my_script.py for example. Then do the following:

cat geocodes.txt \
  | xargs  -I% sh -c "curl -s 'http://maps.googleapis.com/maps/api/geocode/json?latlng='%'&sensor=true' | python my_script.py"

Where my_script.py is:

import json,sys;obj=json.load(sys.stdin);print obj['results'][0]['address_components'][3]['short_name'];

Output:

Cardiff
Cardiff
Cardiff
Cardiff
Cardiff
Cardiff
Cardiff
Cardiff

Seems a bit hacky, I'll admit.


ORIGINAL ANSWER

I am no bash wizard, so my instinct is to simply do everything in Python. The following script would illustrate that approach in Python 3:

import urllib.request as request
import urllib.parse as parse
import json

serviceurl = "http://maps.googleapis.com/maps/api/geocode/json?"

with open("geocodes.txt") as f:
    for line in f:
        url = (serviceurl +
               parse.urlencode({'latlng':line, 'sensor':'true'}))
        with request.urlopen(url) as response:
            bytes_data = response.read()
        obj = json.loads(bytes_data.decode('utf-8'))
        print(obj['results'][0]['address_components'][3]['short_name'])

Output:

Cardiff
Cardiff
Cardiff
Cardiff
Cardiff
Cardiff
Cardiff
Cardiff
Comments