user3425403 user3425403 - 1 year ago 118
Python Question

Need to scrap a table which is loaded through ajax using python(selenium)

I have a page that has a table (table id= "ctl00_ContentPlaceHolder_ctl00_ctl00_GV" class="GridListings" )i need to scrap.
I usually use BeautifulSoup & urllib for it,but in this case the problem is that the table takes some time to load ,so it isnt captured when i try to fetch it using BS.
I cannot use PyQt4,drysracpe or windmill because of some installation issues,so the only possible way is to use Selenium/PhantomJS
I tried the following,still no success:

from import By
from import WebDriverWait
from import expected_conditions as EC
driver = webdriver.PhantomJS()
wait = WebDriverWait(driver, 10)
table = wait.until(EC.presence_of_element_located(By.CSS_SELECTOR, 'table#ctl00_ContentPlaceHolder_ctl00_ctl00_GV'))

The above code doesnt give me the desired contents of the table.
How do i go about achieveing this???

Answer Source

You can get the data using requests and bs4,, with almost if not all asp sites there are a few post params that always need to be provided like __EVENTTARGET, __EVENTVALIDATION etc.. :

from bs4 import BeautifulSoup
import requests

data = {"__EVENTTARGET": "ctl00$ContentPlaceHolder$ctl00$ctl00$RadAjaxPanel_GV",
    "ctl00$ContentPlaceHolder$ctl00$ctl00$ctl00$hdnProductID": "139",
    "ctl00$ContentPlaceHolder$ctl00$ctl00$hdnProductID": "139",
    "ctl00$ContentPlaceHolder$ctl00$ctl00$drpSortField": "Listing Number",
    "ctl00$ContentPlaceHolder$ctl00$ctl00$drpSortDirection": "A-Z, Low-High",
    "__ASYNCPOST": "true"}

And for the actual post, we need to add a few more values to out post data:

post = ""
with requests.Session() as s:
    s.headers.update({"User-Agent":"Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:47.0) Gecko/20100101 Firefox/47.0"})
    soup = BeautifulSoup(s.get(post).content)

    data["__VIEWSTATEGENERATOR"] = soup.select_one("#__VIEWSTATEGENERATOR")["value"]
    data["__EVENTVALIDATION"] = soup.select_one("#__EVENTVALIDATION")["value"]
    data["__VIEWSTATE"] = soup.select_one("#__VIEWSTATE")["value"]

    r =, data=data)
    soup2 = BeautifulSoup(r.content)
    table = soup2.select_one("div.GridListings")

You will see the table printed when you run the code.

Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download