NickP NickP - 3 months ago 19
Python Question

Getting attribute name rather than value with BS4

I have managed to pull out most of the various attributes of a site I am scraping, but have come short trying to extract the value of something within the div declarator itself.

Specifically, assuming the following:

<div class="item" data-color="red" data-itemid="abc">Red Slippers</div>

I am after the value inside data-itemid > abc.

I cannot seem to get something that isn't looking at the value inside the div: i.e. Red Slippers, which is not what I am after.

I have tried the following, without luck:

item_id = soup.find('data-itemid')

Any ideas?

Answer Source

You can use the find_all with a predicate to narrow your search, and then access that particular attribute with dict-like indexing.

from bs4 import BeautifulSoup

soup = BeautifulSoup(text, 'html5lib')

items = soup.find_all('div', {'class' : 'item'})
for item in items:

If you wish to further narrow down your search, you can just add more predicates to your dict, like this:

{'class' : 'item', 'data-color' : 'red', ...} # and so on