webscrapguy webscrapguy -4 years ago 206
Python Question

python web scraping with requests and beautifulsoup

So I am trying to scrap the psn store. specifically this link below. I am trying to grab the data of the games and prices of what is on sale.

https://store.playstation.com/#!/en-us/2-for-1/cid=STORE-MSF77008-PLAYCOLLMULTIBUY

r = requests.get(url)

soup = BeautifulSoup(r.content, "html.parser")

and the data I want is when you right click on the webpage and then click inspect. so for Firewatch for example it would look like this.

"< h3 class="cellTitle">Firewatch "

" < li class="buyPrice ">$19.99"

now when I print out the soup.prettify() I get this

html,body,div,span,applet,object,iframe,h1,h2,h3,h4,h5,h6,p,blockquote,pre,a,abbr,acronym,address,big,cite,code,del,dfn,em,img,ins,kbd,q,s,samp,small,strike,strong,sub,sup,tt,var,b,u,i,center,dl,dt,dd,ol,ul,li,fieldset,form,label,legend,table,caption,

without any of the actual data

I must be doing something wrong here with the functions, but the guides I am reading and other peoples problems all seem to be doing exactly what I am.

Answer Source

With the help of phantomjs(http://phantomjs.org/download.html) and Selenium you can do this

Step: 1. on terminal or cmd use command: pip install selenium 2. Download the phantomjs & unzip it than put the "phantomjs.exe" at python path for example on windows, C:\Python27

Than use this code it will give you desired result:

from  selenium import webdriver
import time
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By


url="https://store.playstation.com/#!/en-us/2-for-1/cid=STORE-MSF77008-PLAYCOLLMULTIBUY"

driver = webdriver.PhantomJS()
driver.get(url)

element = WebDriverWait(driver, 10).until(
        EC.presence_of_element_located((By.CSS_SELECTOR, ".cellTitle")))


gamenames=driver.find_elements(By.CLASS_NAME,'cellTitle')

prices= driver.find_elements(By.CLASS_NAME,'buyPrice ')
time.sleep(2)

if len(gamenames) == len(prices):
    for i in range(len(prices)):
        print "The Name of Game is :" + gamenames[i].text + " The Price for Which is : "+ prices[i].text
else:
    print "Parsing fail as Some data is not parsed properlly, Try Again"
driver.quit()

It will print :

The Name of Game is :Yu-Gi-Oh! Legacy of the Duelist The Price for Which is : $19.99
The Name of Game is :Firewatch The Price for Which is : $19.99
The Name of Game is :The Escapists The Price for Which is : $19.99
The Name of Game is :Oxenfree The Price for Which is : $19.99
The Name of Game is :Duke Nukem 3D: 20th Anniversary World Tour The Price for Which is : $19.99
The Name of Game is :Primal Carnage: Extinction The Price for Which is : $19.99
The Name of Game is :The Bunker The Price for Which is : $19.99
The Name of Game is :Shantae and the Pirate's Curse The Price for Which is : $19.99
The Name of Game is :Pure Pool The Price for Which is : $19.99
The Name of Game is :Banner Saga 2 The Price for Which is : $19.99
The Name of Game is :Armello™ The Price for Which is : $19.99
The Name of Game is :Gone Home: Console Edition The Price for Which is : $19.99
The Name of Game is :Amplitude The Price for Which is : $19.99
The Name of Game is :Dangerous Golf™ The Price for Which is : $19.99
The Name of Game is :Pure Hold'em World Poker Championship The Price for Which is : $19.99
The Name of Game is :Hard Reset Redux The Price for Which is : $19.99
The Name of Game is :Lifeless Planet: Premier Edition The Price for Which is : $19.99
The Name of Game is :The Escapists: The Walking Dead The Price for Which is : $19.99
The Name of Game is :100ft Robot Golf The Price for Which is : $19.99
The Name of Game is :Kholat The Price for Which is : $19.99
The Name of Game is :Pure Chess® Complete Bundle The Price for Which is : $19.99
The Name of Game is :Rogue Stormers The Price for Which is : $19.99
The Name of Game is :SNOW Beta The Price for Which is : $19.99
The Name of Game is :Assault Suit Leynos The Price for Which is : $19.99

Hope this is what you were looking.

Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download