mobcity zkore mobcity zkore -5 years ago 183
Python Question

how to use find all method from BS4 to scrape certain strings

<li class="sre" data-tn-component="asdf-search-result" id="85e08291696a3726" itemscope="" itemtype="">
<div class="sre-entry">
<div class="sre-side-bar">
<div class="sre-content">
<div class="clickable_asdf_card" onclick="'/r/85e08291696a3726?sp=0', '_blank')" style="cursor: pointer;" target="_blank">

I need to grab the string '/r/85e08291696a3726?sp=0' which occurs throughout a page. I'm not sure how to use the soup.find_all method to do this. The strings that I need always occur next to '

This is what I was thinking (below) but obviously I am getting the parameters wrong. How would I format the find_all method to return the '/r/85e08291696a3726?sp=0' strings throughout the page?

for divsec in soup.find_all('div', class_='clickable_asdf_card'):
print('got links')

I read the documentation for bs4 and I was thinking about using find_all('clickable_asdf_card') to find all occurrences of the string I need but then what? Is there a way to adjust the parameters to return the string I need?

Answer Source

Use BeautifulSoup's built-in regular expression search to find and extract the desired substring from an onclick attribute value:

import re

pattern = re.compile(r"window\.open\('(.*?)', '_blank'\)")
for item in soup.find_all(onclick=pattern):

If there is just a single element you want to find, use find() instead of find_all().

Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download