sorry if this is not the place for this question, but I'm not sure where else to ask.
I'm trying to scrape data from rotogrinders.com and I'm running into some challenges.
In particular, I want to be able to scrape previous NHL game data using urls of this format (obviously you can change the date for other day's data):
However, when I get to the page, I notice that the data is broken up into pages, and I'm unsure what to do to get my script to get the data that's presented after clicking the "all" button at the bottom of the page.
Is there a way to do this in python? Perhaps some library that will allow button clicks? Or is there some way to get the data without actually clicking the button by being clever about the URL/request?
Actually, things are not that complicated in this case. When you click "All" no network requests are issued. All the data is already there - inside a
script tag in the HTML, you just need to extract it.
Working code using
requests (to download the page content),
BeautifulSoup (to parse HTML and locate the desired
re (to extract the desired "player" array from the script) and
json (to load the array string into a Python list):
import json import re import requests from bs4 import BeautifulSoup url = "https://rotogrinders.com/game-stats/nhl-skater?site=draftkings&date=11-22-2016" response = requests.get(url) soup = BeautifulSoup(response.content, "html.parser") pattern = re.compile(r"var data = (\[.*?\]);$", re.MULTILINE | re.DOTALL) script = soup.find("script", text=pattern) data = pattern.search(script.text).group(1) data = json.loads(data) # printing player names for demonstration purposes for player in data: print(player["player"])
Jeff Skinner Jordan Staal ... William Carrier A.J. Greer