Truub Truub - 5 months ago 182
Python Question

Python Requests library with proxies - Get request still send my own IP

I am trying to do some web-scraping for a project for my study. Unfortunately I need to try and scrape some data of Google Scholar which blocks my requests. I have tried using (multiple) http proxies but my requests still get blocked after ~300 tries.

The resulting html from the blocked requests contains:

IP address: 145.109...<br/>Time: 2016-05-05T09:23:37Z<br/>URL:
https://scholar.google.nl/citations?hl=en&amp;view_op=search_authors
&amp;mauthors=Perry<br/>


The above IP is my own, while my proxies dict (it selects a proxy from a list at random) and get request look like this:

proxies = {'http': 'http://<username>:<password>@107.182....:<port>'}

result = requests.get('https://scholar.google.nl/citations?hl=en&
amp;view_op=search_authors&amp;mauthors=Perry',
proxies=proxies, headers=headers)


The IPs of are of course valid and work and my own ip is not included in the proxy list. Am I doing something wrong?

Edit: For completeness, i have also tried setting authentication like this answer suggests but the result is the same.

Answer

In your proxies dict the url scheme doesn't match the one you're using for your request, you use a http entry for your proxies but then make a https request. If you ad a proxy for the https scheme, then it should work.