Jen Scott Jen Scott - 1 year ago 63
Python Question

Find specific link w/ beautifulsoup

Hi I cannot figure out how to find links which begin with certain text for the life of me.
findall('a') works fine, but it's way too much. I just want to make a list of all links that begin with

Can anyone help me?

Thank you very much

Answer Source

First set up a test document and open up the parser with BeautifulSoup:

>>> from BeautifulSoup import BeautifulSoup
>>> doc = '<html><body><div><a href="something">yep</a></div><div><a href="">somelink</a></div><a href="">another</a></body></html>'
>>> soup = BeautifulSoup(doc)
>>> print soup.prettify()
   <a href="something">
   <a href="">
  <a href="">

Next, we can search for all <a> tags with an href attribute starting with You can use a regular expression for it:

>>> import re
>>> soup.findAll('a', href=re.compile('^\?id='))
[<a href="">somelink</a>, <a href="">another</a>]