user3257755 user3257755 - 3 months ago 10
HTML Question

Python BeautifulSoup - Find all elements whose class names begin with some string

Assume that we want to find all

li
elements whose class names all begin with a known string and end with an arbitrary id number.

That means that this approach doesn't work:

soup.find_all("li", {"class": KNOWN_STRING})


I have also tried this approach without any luck:

soup.select("li[class^="+KNOWN_STRING)


How can this be solved?

Answer

I would use regex in this approach.

import re

soup.find_all('li', {'class': re.compile(r'regex_pattern')})

Because you have a known string but an arbitrary (I'm assuming unknown) number you can use a regular expression to define the pattern of what you expect the string to be. Example:

re.compile(r'^KNOWN_STRING[0-9]+$')

This would find all known strings with one or more numbers at the end. See this for more about regular expressions in Python.

Edit, to answer the question:

Would this be correct given two digits in the id? soup.find_all('li', {'class': re.compile(r'^TheMatch v-1 c-[0-9][0-9]+$')}). I assume that it wouldn't.

For two digits at the end you would do:

soup.find_all('li', {'class': re.compile(r'^TheMatch v-1 c-[0-9]{2}$')})

The + just means one or more of the previous regular expression.

What I did was specify in brackets {2} after the regular expression the number of instances I was expecting to be there 2.

Comments