Svarto Svarto - 2 months ago 10
JSON Question

Scrapy json response: wildcards and scraping references

I am trying to scrape a JSON response with Scrapy. I was wondering if it is possible to scrape a wildcard path in a JSON that finds the nested value "Metro" and pulls the "distance" within that hiearchy.

In the JSON, there are several poi objects, but I am only interested in the Metro one, and the distance to the Metro. Please see below for the example I am trying to scrape.

I tried with the following code, but it doesn't work as the wildcard doesn't function and the reference is incorrect. I am used to XPATH scraping, so hoping there is some easy way to do this?

loader.add_value('Metro', jsonresponse["poi"][*][["name"]== "Metro"]["distance"])


The full JSON:

"poi":[
{
"distance":1469.0,
"description":"Station",
"walkDistance":1948,
"url":"",
"lon":14,
"time":1890,
"lat":50,
"imgUrl":"https://api.mapy.cz/poiimg/icon/263",
"name":"Metro"
},
{
"distance":2163.0,
"description":"Station",
"walkDistance":4371,
"url":"",
"lon":14,
"time":4200,
"lat":50,
"imgUrl":"https://api.mapy.cz/poiimg/icon/155",
"name":"Tram"
},
{
"distance":33.0,
"description":"Station",
"walkDistance":40,
"url":"",
"lon":14,
"time":36,
"lat":50,
"imgUrl":"https://api.mapy.cz/poiimg/icon/198",
"name":"Bus MHD"
},
{
"distance":1413.0,
"description":"Station",
"walkDistance":2615,
"url":"",
"lon":14,
"time":2382,
"lat":50,
"imgUrl":"https://api.mapy.cz/poiimg/icon/169",
"name":"Vlak"
},
{
"distance":487.0,
"description":"Bankomat",
"walkDistance":968,
"url":"url",
"lon":14,
"time":943,
"lat":50,
"imgUrl":"https://api.mapy.cz/poiimg/icon/28",
"name":"Bankomat"
},
{
"distance":473.0,
"description":"Station",
"walkDistance":614,
"url":"url",
"lon":14,
"time":574,
"lat":50,
"imgUrl":"https://api.mapy.cz/poiimg/icon/122",
"name":"Police"
},
{
"distance":188.0,
"description":"Station",
"walkDistance":250,
"url":"url",
"lon":14,
"time":253,
"lat":50,
"imgUrl":"https://api.mapy.cz/poiimg/icon/72",
"name":"Apothecary"
},
{
"distance":286.0,
"description":"Station",
"walkDistance":400,
"url":"url",
"lon":14,
"time":381,
"lat":50,
"imgUrl":"https://api.mapy.cz/poiimg/icon/144",
"name":"Sport"
},
{
"distance":286.0,
"description":"Station",
"walkDistance":400,
"url":"url",
"lon":14,
"time":381,
"lat":50,
"imgUrl":"https://api.mapy.cz/poiimg/icon/133",
"name":"Restaurant"
},
{
"distance":64.0,
"description":"Station",
"walkDistance":233,
"url":"url",
"lon":14,
"time":216,
"lat":50,
"imgUrl":"https://api.mapy.cz/poiimg/icon/423",
"name":"Supermarket"
},
{
"distance":168.0,
"description":"Station",
"walkDistance":320,
"url":"url",
"lon":14,
"time":295,
"lat":50,
"imgUrl":"https://api.mapy.cz/poiimg/icon/142",
"name":"School"
}

Answer Source

If you want a one-liner, than you can go with this:

distance = [x['distance'] for x in jsonresponse['poi'] if x['name'] == 'Metro'][0]