edsheeran edsheeran - 3 months ago 12
Python Question

How to use BeautifulSoup to get only strings from tags that have specific start?

I am scraping usernames and all of them are in the same a tag and their hrefs all start the same, like this:

<a href="http://lolprofile.net/summoner/eune/Sadastyczny" class="link5">Sadastyczny</a>


I tried finding only if they have the class link5 but there are other values that have that class which I don't want to scrape. So is there a way to search for all the tags which have the

href="http://lolprofile.net/summoner"


in them but not the rest since that obviously is different for every username?

Answer

From the BeautifulSoup documentation.

Using a regular expression you can match the sites. If you have never heard of regular expressions you can use this:

soup.find_all(href=re.compile("http://lolprofile.net/summoner/*"))

Don't forget to import the re-module!