sam.roberts55 sam.roberts55 - 5 months ago 43
Ruby Question

using a proxy with a rails url link

So I have a nokogiri web scrape running perfectly on my local machine.

However when I try and run the web scrape on my production environment it get a 403 error code appear.

I believe this is down to the website blocking my ip of my server (probably because previous people using that ip have blocked it)

Is it possible to route the nokogiri request from my web server through a proxy server? If so how would I go about it?

This is the code I have at the moment.

doc = Nokogiri::HTML(open(URL HERE, 'User-Agent' => 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_7_0) AppleWebKit/535.2 (KHTML, like Gecko) Chrome/15.0.854.0 Safari/535.2'))

Answer

Actually, you can simply use :proxy parameter of the OpenURI open method - see here

open(*rest, &block)
#open provides `open' for URI::HTTP and URI::FTP.

...

The hash may include other options, where keys are symbols:
:proxy

Synopsis:    
:proxy => "http://proxy.foo.com:8000/"
:proxy => URI.parse("http://proxy.foo.com:8000/")

If :proxy option is specified, the value should be String, URI, boolean or nil.

Also, as a general consideration (being tedious now), you should search for alternatives around scrapping content, especially if it's done on a regular basis. Things like supported API or alternative sources. If your current server IP got blocked, same can happen to the proxy