marflar marflar - 6 months ago 33
Ruby Question

Ruby Net::HTTP - following 301 redirects

My users submit urls (to mixes on mixcloud.com) and my app uses them to perform web requests.

A good url returns a 200 status code:

uri = URI.parse("http://www.mixcloud.com/ErolAlkan/hard-summer-mix/")
request = Net::HTTP.get_response(uri)(
#<Net::HTTPOK 200 OK readbody=true>


But if you forget the trailing slash then our otherwise good url returns a 301:

uri = "http://www.mixcloud.com/ErolAlkan/hard-summer-mix"
#<Net::HTTPMovedPermanently 301 MOVED PERMANENTLY readbody=true>


The same thing happens with 404's:

# bad path returns a 404
"http://www.mixcloud.com/bad/path/"
# bad path minus trailing slash returns a 301
"http://www.mixcloud.com/bad/path"



  1. How can I 'drill down' into the 301 to see if it takes us on to a valid resource or an error page?

  2. Is there a tool that provides a comprehensive overview of the rules that a particular domain might apply to their urls?


Answer

301 redirects are fairly common if you do not type the URL exactly as the web server expects it. They happen much more frequently than you'd think, you just don't normally ever notice them while browsing because the browser does all that automatically for you.

Two alternatives come to mind:

1: Use open-uri

open-uri handles redirects automatically. So all you'd need to do is:

require 'open-uri' 
...
response = open('http://xyz...').read

2: Handle redirects with Net::HTTP

def get_response_with_redirect(uri)
   r = Net::HTTP.get_response(uri)
   if r.code == "301"
     r = Net::HTTP.get_response(URI.parse(r.header['location']))
   end
   r
end

If you want to be even smarter you could try to add or remove missing backslashes to the URL when you get a 404 response. You could do that by creating a method like get_response_smart which handles this URL fiddling in addition to the redirects.

Comments