Bhavesh Ghodasara Bhavesh Ghodasara - 11 months ago 41
Python Question

Python regular expression returns nothing for match

I am learning web scrapping using Python. I am trying to extract all links from one of popular financial site's site map.

bsObj = BeautifulSoup(html, "html.parser")

for link in bsObj.findAll("a",

if 'href' in link.attrs:
print('found nothing')

This code founds nothing. Although many links with above match is present in site.
Sample : /india/stockmarket/pricechartquote/A

Answer Source

Have you tried checking if this regex matches the provided part of a url - it does not:

>>> import re
>>> pattern = re.compile("^(/india/stockmarket/pricechartquote/)*$")

Instead, you meant to have the last part after the pricechartquote/ matching, for instance, one or more uppercase letters:

>>> pattern = re.compile(r"^/india/stockmarket/pricechartquote/[A-Z]+$")
<_sre.SRE_Match object at 0x109240098>

Please adjust the [A-Z]+ part depending on what kind of character set you expect to see after pricechartquote/.

Also note that you don't have to check the beginning and end of the string and might be good to go with a partial url match:

for link in bsObj.find_all("a", href=re.compile(r"/india/stockmarket/pricechartquote/")):
    # ...