Bhavesh Ghodasara Bhavesh Ghodasara - 3 months ago 8
Python Question

Python regular expression returns nothing for match

I am learning web scrapping using Python. I am trying to extract all links from one of popular financial site's site map.

bsObj = BeautifulSoup(html, "html.parser")

for link in bsObj.findAll("a",
href=re.compile("^(/india/stockmarket/pricechartquote/)*$")):

if 'href' in link.attrs:
print(link.attrs['href'])
print('found nothing')


This code founds nothing. Although many links with above match is present in site.
Sample : /india/stockmarket/pricechartquote/A

Answer

Have you tried checking if this regex matches the provided part of a url - it does not:

>>> import re
>>>
>>> pattern = re.compile("^(/india/stockmarket/pricechartquote/)*$")
>>> pattern.search("/india/stockmarket/pricechartquote/A")
>>>

Instead, you meant to have the last part after the pricechartquote/ matching, for instance, one or more uppercase letters:

>>> pattern = re.compile(r"^/india/stockmarket/pricechartquote/[A-Z]+$")
>>> pattern.search("/india/stockmarket/pricechartquote/A")
<_sre.SRE_Match object at 0x109240098>

Please adjust the [A-Z]+ part depending on what kind of character set you expect to see after pricechartquote/.


Also note that you don't have to check the beginning and end of the string and might be good to go with a partial url match:

for link in bsObj.find_all("a", href=re.compile(r"/india/stockmarket/pricechartquote/")):
    # ...