Update: Problem was incomplete documentation, event dispatcher passing kwargs to the hook function.
I have a list of about 30k URLs that I want to check for various strings. I have a working version of this script using Requests & BeautifulSoup, but it doesn't use threading or asynchronous requests so it's incredibly slow.
Ultimately what I would like to do is cache the html for each URL so I can run multiple checks without making redundant HTTP requests to each site. If I have a function that will store the html, what's the best way to asynchronously send the HTTP GET requests and then pass the response objects?
I've been trying to use Grequests (as described here) and the "hooks" parameter, but I'm getting errors and the documentation doesn't go very in-depth. So I'm hoping someone with more experience can shed some light.
Here's a simplified example of what I'm trying to accomplish:
urls = ['http://www.google.com/finance','http://finance.yahoo.com/','http://www.bloomberg.com/']
sites = 
for u in url_list:
rs = grequests.get(u, hooks=dict(response=print_url))
TypeError: print_url() got an unexpected keyword argument 'verify'
<Greenlet at 0x32803d8L: <bound method AsyncRequest.send of <grequests.AsyncRequest object at 0x00000000028D2160>>
(stream=False)> failed with TypeError
I tried your code and could get it work by adding an additional parameter kwargs to your print_url function.
def print_url(r, **kwargs): print r.url
I figured what was wrong in this other stackoverlow question: Problems with hooks using Requests Python package.
It seems when you use the response hook in grequests you need to add **kwargs in your callback definition.