John Rambo John Rambo - 5 months ago 56
Python Question

BeautifulSoup not working, getting NoneType error

I am using the following code (Taken from retrieve links from web page using python and BeautifulSoup):

import httplib2
from BeautifulSoup import BeautifulSoup, SoupStrainer

http = httplib2.Http()
status, response = http.request('http://www.nytimes.com')

for link in BeautifulSoup(response, parseOnlyThese=SoupStrainer('a')):
if link.has_attr('href'):
print link['href']


However, I don't understand why I am getting the following error message:

Traceback (most recent call last):
File "C:\Users\EANUAMA\workspace\PatternExtractor\src\SourceCodeExtractor.py", line 13, in <module>
if link.has_attr('href'):
TypeError: 'NoneType' object is not callable


BeautifulSoup 3.2.0
Python 2.7

EDIT:

I tried the solution available for the similar question(Type error if link.has_attr('href'): TypeError: 'NoneType' object is not callable), but it is giving me following error:

Traceback (most recent call last):
File "C:\Users\EANUAMA\workspace\PatternExtractor\src\SourceCodeExtractor.py", line 12, in <module>
for link in BeautifulSoup(response).find_all('a', href=True):
TypeError: 'NoneType' object is not callable

Answer

First of all:

from BeautifulSoup import BeautifulSoup, SoupStrainer

You are using BeautifulSoup version 3 which is no longer maintained. Switch to BeautifulSoup version 4. Install it via:

pip install beautifulsoup4

and change your import to:

from bs4 import BeautifulSoup

Also:

Traceback (most recent call last): File "C:\Users\EANUAMA\workspace\PatternExtractor\src\SourceCodeExtractor.py", line 13, in if link.has_attr('href'): TypeError: 'NoneType' object is not callable

Here link is a Tag instance which does not have an has_attr method. This means that, remembering what a dot notation means in BeautifulSoup, it would try to search for element has_attr inside the link element which results into nothing found. In other words, link.has_attr is None and obviously None('href') results into an error.

Instead, do:

soup = BeautifulSoup(response, parse_only=SoupStrainer('a', href=True))
for link in soup.find_all("a", href=True):
    print(link['href'])

FYI, here is a complete working code that I used to debug your problem (using requests):

import requests
from bs4 import BeautifulSoup, SoupStrainer


response = requests.get('http://www.nytimes.com').content
for link in BeautifulSoup(response, parseOnlyThese=SoupStrainer('a', href=True)).find_all("a", href=True):
    print(link['href'])