Rio Arfani Rio Arfani - 3 months ago 18
Python Question

Python BeautifulSoup can't read div tag

I'm trying to get products for a project i'm working on from this page:lazada

page ispection

using :




from bs4 import BeautifulSoup
import urllib
import re
r = urllib.urlopen("http://www.lazada.co.id/catalog/?q=note+2").read()
soup = BeautifulSoup(r,"lxml")
letters = soup.findAll("span",class_=re.compile("product-card__name"))
print type(letters)
print letters[0]



When I do this I am getting error


Traceback (most recent call last):
File "C:/Python27/project/testaja.py", line 9, in
print letters[0]
IndexError: list index out of range


. Any thoughts on this?

Answer

I think you may have hit their page too much, navigate there in a browser and see what the page returns on your network.

Also, you can modify your code so you can check the page response header to make sure that the page returned properly before trying to scrape it. I modified your code to show an example of this below:

from bs4 import BeautifulSoup
import urllib
import re

r = urllib.urlopen("http://www.lazada.co.id/catalog/?q=note+2")
header_code = r.getcode()

if header_code == 200:
    html = r.read()
    soup = BeautifulSoup(html, "lxml")
    letters = soup.findAll("span", {"class" : re.compile("product-card__name")})

    for letter in letters:
        print letter
else:
    print("oops, something went wonky. Page response was: %s"% header_code)