fahrradlaus fahrradlaus -4 years ago 93
CSS Question

select item with beatifulsoup

I have the following list of results:

<div id="resultlist" class="result-list ">
<article itemscope="" itemtype="http://schema.org/Residence" class="search-result-entry ">
<article itemscope="" itemtype="http://schema.org/Residence" class="search-result-entry ">
<article class="search-result-entry" id="wh_adition_FakeAd1">
<article itemscope="" itemtype="http://schema.org/Residence" class="search-result-entry ">
...


With beautifulsoup I try to select all entries with the class "search-result-entry", and the itemtype="http://schema.org/Residence".

response = requests.get(url)

#cancel parsing if page doesnt exists
if response.status_code is not 200:
return

soup = bs4.BeautifulSoup(response.text, "lxml")
#print(soup.select("resultlist")

#select all listings from lise, execlude adds
results = soup.select('.search-result-entry')
print(results)


However, at the moment I'm selecting also those with the
id="wh_adition_FakeAd1"
, which gives me an index error some lines later.
I tried this, without any result:

results = soup.select('.search-result-entry meta[itemtype=http://schema.org/Residence]')


Any idea how I can manage to select only the entries, that I need?

Many thanks in advance.

Answer Source

You can try this. Find all article tags that have your desired itemtype. This will print only the ones with that attribute.

for line in soup.findAll("article", {"itemtype" : "http://schema.org/Residence"}):
    print line

And you can do the same with the ones that have and ID

for line in soup.findAll("article", {"id" : "wh_adition_FakeAd1"}):
    print line

Hope this helps.

Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download