Honzys Honzys - 1 month ago 8
HTML Question

Downloading dynamically loaded webpage with python

I have this website and I want to download the content of the page.

I tried selenium, and button clicking with it, but with no success.

#!/usr/bin/env python
from contextlib import closing
from selenium.webdriver import Firefox
import time

# use firefox to get page with javascript generated content
with closing(Firefox()) as browser:
# setting the url
browser.get("http://bonusbagging.co.uk/oddsmatching.php#")
# finding and clicking the button
button = browser.find_element_by_id('select_button')
button.click()
page = browser.page_source
time.sleep(5)
print(page.encode("utf8"))


This code only downloads the source code, where the data are hidden.

Can someone show me the right way to do that? Or tell my how can be the hidden data downloaded?

Thanks in advance!

Answer

I always try to avoid selenium like the plague when scraping; it's very slow and is almost never the best way to go about things. You should dig into the source more before scraping; it was clear on this page that the html was coming in and then a separate call was being made to get the table's data. Why not make the same call as the page? It's lightning fast and requires no html parsing; just returns raw data, which seems to be what you're looking for. the python requests import is perfect for this. Happy Scraping!

import requests

table_data = requests.get('http://bonusbagging.co.uk/odds-server/getdata_slow.php').content

PS: The best way to look for these calls is to open the dev console, and check out the network tab. You can see what calls are being made here. Another way is to go to the sources tab, look for some javascript, and search for ajax calls (that's where I got the url I'm calling to above, the path was: top/odds-server.com/odds-server/js/table_slow.js). The later option is sometimes easier, sometimes it's nearly impossible (if the file is minified/uglified). Do whatever works for you!

Comments