slpcf slpcf - 8 months ago 66
HTML Question

How to search for specific text in span in beautifulsoup?

I have the HTML as:

<span id="ContentPlaceHolder1_grd_reminder_Label1_0">Engineering Mechanics</span>
<span id="ContentPlaceHolder1_grd_reminder_Label1_2">Engineering Mechanics</span>

my code for getting span text is :

trs = soup.find_all('tr')
for tr in trs:
spans = tr.find_all('span')
if == "ContentPlaceHolder1_grd_reminder_Label***":
print spans.string

In this line == "ContentPlaceHolder1_grd_reminder_Label***"
, I want to get all the ids having the same text at the beginning but different numbers at the last (like the above contents the number at last -
). But my code is an error. How can I solve it?


First of all, your current code does not work for multiple reasons:

  • the spans is actually a ResultSet object - a list of tags and it does not have an id attribute
  • even if spans would be a single Tag instance, the would not get you the id attribute - it would actually mean spans.find("id") which would result into None. To get the attribute value of a Tag, use it like a dictionary, e.g: span["id"]
  • you cannot do the partial match with == and * in the string

We can do better and solve it in a cleaner way anyway.

The easiest thing to do is to use the "starts with" CSS selector:

for elm in"span[id^=ContentPlaceHolder1_grd_reminder_Label]"):

Or, if via find_all(), you can either use a filtering function:

for elm in soup.find_all("span", id=lambda value: value and value.startswith("ContentPlaceHolder1_grd_reminder_Label"):

Or, a regular expression:

import re

for elm in soup.find_all("span", id=re.compile("^ContentPlaceHolder1_grd_reminder_Label")):

where ^ denotes the beginning of a string.