Andrew Andrew - 7 months ago 45
Ruby Question

Scraping Reddit using Nokogiri (429 too many requests)

I'm trying to scrape Reddit with Nokogiri, but a single run of this keeps telling me that I'm putting in too many requests.

require 'nokogiri'
require 'open-uri'
url = ""
redditscrape = Nokogiri::HTML(open(url))

OpenURI::HTTPError: 429 Too Many Requests

Isn't this only one request? If it's not, how do I create sleep intervals for Nokogiri?


Reddit has an API

You could probably query the API for the particular sub-reddit(s) you want to scrape. Attempting to scrape all of reddit just seems like a nightmare waiting to happen considering the high volume and the nested comments.


It looks like Reddit is blocking the ability to scrape in favor of using their public API.