loki loki - 7 months ago 39
Python Question

How can i add different proxy to scrapy for every request(or thread)

Subj. Our spider follows links and parses them with "parse page" function, which returns item. How can I add different proxy to each request, before the first call of parse_page?

For example, I have pool of 250 proxies and want to randomly select each one for requests.


You can create some middleware for this. For example:

#Start your middleware class
class ProxyMiddleware(object):

# overwrite process request
def process_request(self, request, spider):

    # Set the location of the proxy
    request.meta['proxy'] = "http://123.456.789.012"

    # Use the following lines if your proxy requires authentication
    proxy_user_pass = "USER_AND_PASS"

    # setup basic authentication for the proxy
    encoded_user_pass = base64.encodestring(proxy_user_pass)
    request.headers['Proxy-Authorization'] = 'Basic ' + encoded_user_pass

I believe that you could easily randomize the proxy url, username, and password by modifying the above code. Let me know if you need any additional assistance.