Ryan Dooley Ryan Dooley - 5 months ago 12
Python Question

Scrape PHP Ajax using Python

I'm a beginner with python, I'm trying build a python program that will scrape product descriptions from http://turnpikeshoes.com/shop/TCF00003. Python has many libraries and I'm sure many approaches to achieving my goal. I've done a few successful scrapes using requests however the fields I was looking for are not showing up, Using chromes inspector I found an Ajax POST request.

Here is my code

from lxml import html
import requests

url = 'http://turnpikeshoes.com/shop/TCF00003'
#URL
headers = {'user-agent': 'my-app/0.0.1'}
#Header info sent to server
page = requests.get(url, headers=headers)
#Get response
tree = html.fromstring(page.content)
#Page Content


ShortDsc = tree.xpath('//span[@itemprop="reviewBody"]/text()')

LongDsc = tree.xpath('//li[@class="productLongDescription"]/text()')

print 'ShortDsc:', ShortDsc
print 'LongDsc:', LongDsc


I think I need to send a request directly to admin-ajax.php

Any help is greatly appreciated

Val Val
Answer

You should try selenium in this case if you want to scrape javascript content:

from selenium import webdriver
import time

driver = webdriver.PhantomJS()
driver.get("http://turnpikeshoes.com/shop/TCF00003")
time.sleep(5)

LongDsc = driver.find_element_by_class_name("productLongDescription").text

print 'LongDsc:', LongDsc

Btw, you should also install PhantomJS as headless browser.