mariolopes mariolopes - 5 months ago 46
Python Question

Python scrape style display:none

I want to scrape one webpage and I need to find if the style of the element is display:none; or display:block as the following code . (If I see the source of the webpage I can't see any of this style. I know it because I use the Inspect element from Chrome)

<p id="add_to_cart" class="buttons_bottom_block no-print" style="display: none;">
<button type="submit" name="Submit" class="exclusive">
<span>¡Cómprame!</span>
</button>
</p>


<p id="add_to_cart" class="buttons_bottom_block no-print" style="display: block;">
<button type="submit" name="Submit" class="exclusive">
<span>¡Cómprame!</span>
</button>
</p>


It's about one Prestashop shop online
Please look at the following video https://youtu.be/wlngNaNw1Ao
and you'll see the div oosHook change the style display:block or display:none but you can see this on the source code. Please check the link
https://www.esenciadeperfume.com/bvlgari/bvlgari-man-in-black-edp.html#/6-formato-100_ml_tester

and select one and other product you'll see the changes but if you analyze the source code it looks the same on all choices. I wrote the following python code for test and it can't detect the changes:

import urllib.request
import re
import pymysql
from bs4 import BeautifulSoup

#link1='https://www.esenciadeperfume.com/bvlgari/bvlgari-man-in-black-edp.html#/6-formato-100_ml_tester'
link1='my reputation doesn't allow'
req = urllib.request.Request(link1, headers={'User-Agent': 'Mozilla/5.0'})
htmltext = urllib.request.urlopen(req).read()
if htmltext is None:
print('erro')
else:
matches=re.findall('<div id="oosHook" style="display: block;">',str(htmltext))
if len(matches)==0:
print('Not found')
else:
print('Found')


Ok It seems with the following code I can do the job

import urllib.request
import re
import pymysql
from bs4 import BeautifulSoup
from selenium import webdriver
link1='https://www.esenciadeperfume.com/bvlgari/bvlgari-man-in-black-edp.html#/6-formato-100_ml_tester'
#link1='https://www.esenciadeperfume.com/bvlgari/bvlgari-man-in-black-edp.html#/20-formato-60_ml'
browser = webdriver.Firefox() # Your browser will open, Python might ask for permission
browser.get(link1) # This might take a while
soup = BeautifulSoup(browser.page_source,'html.parser')
cart_style = soup.find('p', id='add_to_cart').get('style')
oos_style = soup.find('div', id='oosHook').get('style')
print('Oos_style-> '+oos_style)


The problem: The process it to slow

Answer

I'm assuming you know how to make a request and get the page source in python.

If you work with BeautifulSoup you can search for the elements and get the tags and attributes from there. You could have something like:

from bs4 import BeautifulSoup as bs

soup = bs(souce_code)
elements = soup.find_all('p')

for e in elements:
    style = e.get('style').split(';')  # Here I'm account for multiple entries in the style
    for s in style:
        if 'display' in s:
            print s.split(':')[1]  # Prints 'none', 'block' or any other display style.


You could also work with the styles in several different ways, I decided to keep this for understandability, but you could have a more direct approach or use re to treat it directly.


EDIT

Ok, you are trying to scrap a dynamic webpage, thats a little bit different. You need to create a session and wait for the server to do all the changes it needs to do.

I tried here and successfully got a page using the selenium package. Instead of using a simple request, try the following:

from selenium import webdriver

"""There are actually several options here,
   choose the one you like most 
   (you need the browser to be installed in your pc)"""
browser = webdriver.Firefox()  # Your browser will open, Python might ask for permission
browser.get(url)               # This might take a while
soup = bs(browser.page_source)

# And than you can keep working from here
cart_style = soup.find('p', id='add_to_cart').get('style')
oos_style = soup.find('div', id='oosHook').get('style')
Comments