Thavivelball Thavivelball - 7 months ago 111
Python Question

Select a tag inside a class with bs4

I'm trying to get the href of this part of html:

<h3 class="post-title entry-title" itemprop="name">
<a href="">01-10-16 | Free SSL Proxies (1070)</a>

So I created this script:

import urllib.request
from bs4 import BeautifulSoup

url = ""
soup = BeautifulSoup(urllib.request.urlopen(url))
for tag in soup.find_all("h3", "post-title entry-title"):
links = tag.get("href")

But links, doesn't find anything. This is because, the class "post-title entry-title" that I selected with bs4, has not attribute "href"...

In fact the output of:

print (tag.attrs)


{'itemprop': 'name', 'class': ['post-title', 'entry-title']}

How can I do to select the "a" element and get the links in href?


You can quickly solve it by getting the inner a element:

for tag in soup.find_all("h3", "post-title entry-title"):
    link = tag.a.get("href")

where tag.a is a shortcut to tag.find("a").

Or, you can match the a element directly with a CSS selector:

for a in" > a"):
    link = a.get("href")

where dot is a class attribute selector, > means direct parent-child relationship.

Or, you can check itemprop attribute instead of a class:

for a in"h3[itemprop=name] > a"):
    link = a.get("href")