slpcf slpcf - 1 month ago 6
Python Question

How to find text of <div><span>text</span></div> in beautifulsoup?

This is the HTML:

<div><div id="NhsjLK">
<li class="EditableListItem NavListItem FollowersNavItem NavItem not_removable">
<a href="/profile/Dileep-Sankhla/followers">Followers <span class="list_count">92</span></a></li></div></div>


I want to extract the text
92
and convert it into integer and print in python2. How can I?
Code:

i = soup.find('div', id='NhsjLK')
print "Followers :", i.find('span', id='list_count').text

Answer

I'd not go with getting it by the class directly, since I think "list_count" is too broad of a class value and might be used for other things on the page.

There are definitely several different options judging by this HTML snippet alone, but one of the nicest, from my point of you, is to use that "Followers" text/label and get the next sibling of it:

from bs4 import BeautifulSoup

data = """
<div><div id="NhsjLK">
<li class="EditableListItem NavListItem FollowersNavItem NavItem not_removable">
<a href="/profile/Dileep-Sankhla/followers">Followers <span class="list_count">92</span></a></li></div></div>"""

soup = BeautifulSoup(data, "html.parser")
count = soup.find(text=lambda text: text and text.startswith('Followers')).next_sibling.get_text()
count = int(count)
print(count)

Or, an another, a very concise and reliable approach would be to use the partial match (the *= part below) on the href value of the parent a element:

count = int(soup.select_one("a[href*=followers] .list_count").get_text())

Or, you might check the class value of the parent li element:

count = int(soup.select_one("li.FollowersNavItem .list_count").get_text())