user2988098 user2988098 - 1 year ago 62
HTML Question

How can I parse a dynamic page using Python?

I am using Ghost and BeautifulSoup to parse a HTML page. The problem that I have, is that the content of this page is dynamic (created with angularJS). At the beginning the html only shows something like "please wait! page loading". After a few seconds the content of the html appears. Using Ghost and BeatifulSoup I just get the HTML code of the loading page whith only 2 small divs. The URL stays the same. Is there a possibility to wait until the "real" content is loaded?

Answer Source
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.select import Select
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

driver = webdriver.PhantomJS()
driver.get("your url here")

# waiting for the page to load - TODO: change
wait = WebDriverWait(driver, 10)
wait.until(EC.visibility_of_element_located((By.ID, "content")))

data = driver.page_source
driver.close()

soup = BeautifulSoup(data, "html.parser")