Echchama Nayak Echchama Nayak - 5 months ago 27
Javascript Question

Extracting javascript enabled data from page

I am trying to extract coloured text in this link. I am using BeautifulSoup in python. The code is as follows:

import time, urllib2, re
from bs4 import BeautifulSoup
url='http://de.vroniplag.wikia.com/wiki/Aaf/008'
def gethtml(link):
time.sleep(2)
req = urllib2.Request(link, headers={'User-Agent': "Magic Browser"})
con = urllib2.urlopen(req)
html = con.read()
return html

soup=BeautifulSoup(gethtml(url),'html.parser')
print soup.findAll('span', attrs={"class": re.compile('fragmark')})


But the returned result is empty. How can I change it to make it work?

UPDATE:

I am using chromedriver, in the code as follows:

from selenium import webdriver
import os

chromedriver = "./chromedriver"
os.environ["webdriver.chrome.driver"] = chromedriver
driver = webdriver.Chrome(chromedriver)

driver.get('http://de.vroniplag.wikia.com/wiki/Aaf/008')
for tag in driver.find_elements_by_css_selector('[class^=fragmark]'):
print(tag.text)
driver.quit()


But the browser opens. The result doesn't appear. I close the browser and then an error occurs.

Answer

You need to use library that can interpret javascript. For example using selenium because those elements with fragmark1, fragmark2, ... are created by the javascript.

from selenium.webdriver import Chrome as Driver
# Replace with `Chrome` with your system browser

driver = Driver()
driver.get('http://de.vroniplag.wikia.com/wiki/Aaf/008')
for tag in driver.find_elements_by_css_selector('[class^=fragmark]'):
    print(tag.text)