seeker_of_bacon seeker_of_bacon - 1 year ago 391
Python Question

Writing instagram crawler with Scrapy. How can I go to the next page?

As an exercise, I decided to write a python script that would get all the images of the specified user. I'm somewhat familiar with Scrapy, this is why I chose it as scraping tool. Currently the script is capable of downloading the images only from the first page (12 max).

From what I can tell, instagram pages are generated by javascript. Scrapy's

(which is like source code viewed from Chrome) does not show html structure like Chrome's Inspector does. In Chrome, after 12 images, at the bottom, there's a button with link to the next page.

For example, Link to page 2 is On page 2 there's a link to page 3 with

How can I grab that number in Scrapy so I can send my spider there?
doesn't even contain that number. Is there another way to reach the next page?

I know Instagram API would provide some benefits but I thought it can be done without all those tokens.

Answer Source

according to robots.txt policy you should avvoid crawling /api/, /publicapi/ and /query/ paths, so crawl carefully (and responsibly) on the user pagination.

Also from what I see pagination starts with a "Load more" request, that is in fact a request (that you need to check) with only two necessary values owner and end_cursor sent as a POST request.

Those values can be found in the original request body inside '//script[contains(., "sharedData")]/text()'