I am learning web scrapping using Python. I am trying to extract all links from one of popular financial site's site map.
bsObj = BeautifulSoup(html, "html.parser")
for link in bsObj.findAll("a",
if 'href' in link.attrs:
Have you tried checking if this regex matches the provided part of a url - it does not:
>>> import re >>> >>> pattern = re.compile("^(/india/stockmarket/pricechartquote/)*$") >>> pattern.search("/india/stockmarket/pricechartquote/A") >>>
Instead, you meant to have the last part after the
pricechartquote/ matching, for instance, one or more uppercase letters:
>>> pattern = re.compile(r"^/india/stockmarket/pricechartquote/[A-Z]+$") >>> pattern.search("/india/stockmarket/pricechartquote/A") <_sre.SRE_Match object at 0x109240098>
Please adjust the
[A-Z]+ part depending on what kind of character set you expect to see after
Also note that you don't have to check the beginning and end of the string and might be good to go with a partial url match:
for link in bsObj.find_all("a", href=re.compile(r"/india/stockmarket/pricechartquote/")): # ...