slpcf slpcf - 1 month ago 20
HTML Question

How to search for specific text in span in beautifulsoup?

I have the HTML as:

<tr>
<span id="ContentPlaceHolder1_grd_reminder_Label1_0">Engineering Mechanics</span>
</tr>
<tr>
<span id="ContentPlaceHolder1_grd_reminder_Label1_2">Engineering Mechanics</span>
</tr>
...


my code for getting span text is :

trs = soup.find_all('tr')
for tr in trs:
spans = tr.find_all('span')
if spans.id == "ContentPlaceHolder1_grd_reminder_Label***":
print spans.string


In this line
spans.id == "ContentPlaceHolder1_grd_reminder_Label***"
, I want to get all the ids having the same text at the beginning but different numbers at the last (like the above contents the number at last -
1_0
). But my code is an error. How can I solve it?

Answer

First of all, your current code does not work for multiple reasons:

  • the spans is actually a ResultSet object - a list of tags and it does not have an id attribute
  • even if spans would be a single Tag instance, the spans.id would not get you the id attribute - it would actually mean spans.find("id") which would result into None. To get the attribute value of a Tag, use it like a dictionary, e.g: span["id"]
  • you cannot do the partial match with == and * in the string

We can do better and solve it in a cleaner way anyway.


The easiest thing to do is to use the "starts with" CSS selector:

for elm in soup.select("span[id^=ContentPlaceHolder1_grd_reminder_Label]"):
    print(elm.get_text())

Or, if via find_all(), you can either use a filtering function:

for elm in soup.find_all("span", id=lambda value: value and value.startswith("ContentPlaceHolder1_grd_reminder_Label"):
    print(elm.get_text())

Or, a regular expression:

import re

for elm in soup.find_all("span", id=re.compile("^ContentPlaceHolder1_grd_reminder_Label")):
    print(elm.get_text())

where ^ denotes the beginning of a string.

Comments