st4rgut st4rgut - 14 days ago 6
Python Question

How to find links within a specified class with Beautiful Soup

I'm using Beautiful Soup 4 to parse a news site for links contained in the body text. I was able to find all the paragraphs that contained the links but the

paragraph.get('href')
returned type
none
for each link. I'm using Python 3.5.1. Any help is really appreciated.

from bs4 import BeautifulSoup
import urllib.request
import re

soup = BeautifulSoup("http://www.cnn.com/2016/11/18/opinions/how-do-you-deal-with-donald-trump-dantonio/index.html", "html.parser")

for paragraph in soup.find_all("div", class_="zn-body__paragraph"):
print(paragraph.get('href'))

Answer

Do you really want this?

for paragraph in soup.find_all("div", class_="zn-body__paragraph"):
    for a in paragraph("a"):
       print(a.get('href'))

Note that paragraph.get('href') tries to find attribute href in <div> tag you found. As there's no such attribute, it returns None. Most probably you actually have to find all tags <a> which a descendants of your <div> (this can be done with paragraph("a") which is a shortcut for paragraph.find_all("a") and then for every element <a> look at their href attribute.

Comments