Brian C. Brian C. - 30 days ago 10
HTTP Question

How to make asynchronous HTTP GET requests in Python and pass response object to a function

Update: Problem was incomplete documentation, event dispatcher passing kwargs to the hook function.

I have a list of about 30k URLs that I want to check for various strings. I have a working version of this script using Requests & BeautifulSoup, but it doesn't use threading or asynchronous requests so it's incredibly slow.

Ultimately what I would like to do is cache the html for each URL so I can run multiple checks without making redundant HTTP requests to each site. If I have a function that will store the html, what's the best way to asynchronously send the HTTP GET requests and then pass the response objects?

I've been trying to use Grequests (as described here) and the "hooks" parameter, but I'm getting errors and the documentation doesn't go very in-depth. So I'm hoping someone with more experience can shed some light.

Here's a simplified example of what I'm trying to accomplish:

import grequests

urls = ['http://www.google.com/finance','http://finance.yahoo.com/','http://www.bloomberg.com/']

def print_url(r):
print r.url

def async(url_list):
sites = []
for u in url_list:
rs = grequests.get(u, hooks=dict(response=print_url))
sites.append(rs)
return grequests.map(sites)

print async(urls)


And it produces the following TypeError:

TypeError: print_url() got an unexpected keyword argument 'verify'
<Greenlet at 0x32803d8L: <bound method AsyncRequest.send of <grequests.AsyncRequest object at 0x00000000028D2160>>
(stream=False)> failed with TypeError


Not sure why it's sending 'verify' as a keyword argument by default; it would be great to get something working though, so if anyone has any suggestions (using grequests or otherwise) please share :)

Thanks in advance.

Answer Source

I tried your code and could get it work by adding an additional parameter kwargs to your print_url function.

def print_url(r, **kwargs):
    print r.url

I figured what was wrong in this other stackoverlow question: Problems with hooks using Requests Python package.

It seems when you use the response hook in grequests you need to add **kwargs in your callback definition.