Raphadasilva Raphadasilva - 1 month ago 14
Python Question

Attribute error with BeautifulSoup get method

I'm trying to make a webscraper in Python using urllib and BeautifulSoup. My laptop works on Debian, so I don't use the latest version of urllib.

My goal is quite simple : extract values from a Wikipedia table like this one.

So I started my script with :

import urllib
from bs4 import BeautifulSoup

start ="https://fr.wikipedia.org/wiki/Liste_des_monuments_historiques_de_Strasbourg"
url = urllib.urlopen(start).read()
bsObj = BeautifulSoup(url)

table = bsObj.find("table", {"class":"wikitable sortable"})
lines = table.findAll("tr")


Then, I used a for loop to scrap specific values from each row of the Wikipedia table :

for line in lines:
longitude = line.find("data", {"class":"p-longitude"})
print(longitude)
latitude = line.find("data", {"class":"p-latitude"})
print(latitude)


This gave for example :

<data class="p-longitude" value="7.764953">7° 45′ 54″ Est</data>
<data class="p-latitude" value="48.588848">48° 35′ 20″ Nord</data>


I thought that get() method would work fine, as :

longitude = line.find("data", {"class":"p-longitude"}).get("value")
print(longitude)


But my terminal print this error :

Traceback (most recent call last):
File "scraper_monu_historiques_wikipedia.py", line 14, in <module>
longitude = line.find("data", {"class":"p-longitude"}).get("value")
AttributeError: 'NoneType' object has no attribute 'get'


I didn't understand why, because my variables latitude and longitude are BeautifulSoup Tags (I checked with a type()), so the get methode should work...

Thanks in advance if you have the solution !

Answer

In this loop:

for line in lines:
    longitude = line.find("data", {"class":"p-longitude"})
    print(longitude)
    latitude = line.find("data", {"class":"p-latitude"})
    print(latitude)

For some of the lines, longitude and latitude are found, but for others the are not found, so they are set to None. You have to check whether it is found or not before performing any further operations, e.g.:

for line in lines:
    longitude = line.find("data", {"class":"p-longitude"})
    latitude = line.find("data", {"class":"p-latitude"})
    if longitude and latitude:
        longitude_value = longitude.get('value')
        latitude_value = latitude.get('value')
        print(longitude_value, latitude_value)
Comments