Ludwig T. Ludwig T. - 2 months ago 16
HTTP Question

Untraceable HTTP redirection?

I'm currently working on a project to track products from several websites. I use a python scraper to retrieve all the URLs related to the listed products, and later, regularly check if these URLs are still active.

To do so I use the Python requests module, run a get request and look at the response's status code. Usually I get 200, 301, 302 or 404 as expected, except in the following case:

http://www.sephora.fr/Parfum/Parfum-Femme/Totem-Orange-Eau-de-Toilette/P2232006

This product has been removed and while opening the link (sorry it's in French), I am briefly shown a placeholder page saying the product is not available anymore and then redirected to the home page (www.sephora.fr).

Oddly, Python still returns a 200 status code and so do various redirect tracers such as wheregoes.com or redirectdetective.com. The worst part is that the response URL still is the original, so I can't even trace it that way.

When analyzing with Chrome DevTools and preserving the logs, I see that at some point the page is reloaded. However I'm unable to find out where.

I'm guessing this is done client-side via Javascript, but I'm not quite sure how. Furthermore, I'd really need to be able to detect this change from within Python.

As a reference, here's a link to a working product:

http://www.sephora.fr/Parfum/Parfum-Femme/Kenzo-Jeu-d-Amour-Eau-de-Parfum/P1894014

Any leads?

Thank you !
Ludwig

Answer

The page has a meta tag, that redirects the page to the root URL:

<meta http-equiv="refresh" content="0; URL=/" />