Larsenv Larsenv - 6 months ago 16
Python Question

Parsing a div with a "class" attribute

Using the BeautifulSoup module in Python, I'm trying to parse this webpage below.

<div class="span-body"><div class="timestamp updated" title="2016-05-08T1231Z">May 8, 12:31 PM EDT</div></div>


I'm trying to get the script below to return
2016-05-08T1231Z
, which is found in the second div with the
timestamp updated
class.

with open("index.html", 'rb') as source_file:
soup = BeautifulSoup(source_file.read()) # Read the source file and get BeautifulSoup to work with it.
div_1 = soup.find("div", {"class": "span-body"}).contents[0] # Parse the first div.
div_2 = div_1("div", {"class": "timestamp updated"}) # Parse the second div.
print div_2


div_1
returns what I wanted to return (the second div), but
div_2
isn't, instead it's only giving me an empty list in return.

How can I fix this problem?

Answer

A couple of options, all of which you should just drop contents[0]:

div_1 = soup.find("div", {"class": "span-body"}) # Parse the first div.
div_2 = div_1("div", {"class": "timestamp updated"}) 

This will return a list with one element in it:

[<div class="timestamp updated" title="2016-05-08T1231Z">May 8, 12:31 PM EDT</div>]

Just use find():

div_1 = soup.find("div", {"class": "span-body"})
div_2 = div_1.find("div", {'class': 'timestamp updated'})
print(div_2)

Result:

<div class="timestamp updated" title="2016-05-08T1231Z">May 8, 12:31 PM EDT</div>

If you don't need the intermediate div_1 why not just go straight to div_2?

div_2 = soup.find("div", {'class': 'timestamp updated'})

Edit from comment: To get the value of the title attribute you can index it like this:

div_2['title']