Hunter Hunter - 6 months ago 24
HTML Question

Find third occurring `<p>` tag using with Beautiful Soup

As the title suggests, I'm trying to understand how to find the third occurring

<p>
of a website (as an example, I used the following website: http://www.musicmeter.nl/album/31759).

Using the answer to this question, I tried the following code

from bs4 import BeautifulSoup
import requests
html = requests.get("http://www.musicmeter.nl/album/31759").text # get HTML from http://www.musicmeter.nl/album/31759
soup = BeautifulSoup(html, 'html5lib') # Get data out of HTML

first_paragraph = soup.find('p') # or just soup.p

print "first paragraph:", first_paragraph

second_paragraph = first_paragraph.find_next_siblings('p')

print "second paragraph:", second_paragraph

third_paragraph = second_paragraph.find_next_siblings('p')

print "third paragraph:", third_paragraph


But this code results in the following error for the third_paragraph:

Traceback (most recent call last):
File "page_109.py", line 21, in <module>
third_paragraph = second_paragraph.find_next_siblings('p')
AttributeError: 'ResultSet' object has no attribute 'find_next_siblings'


I tried to lookup the error, but I couldn't figure out what is wrong.

Answer

.find_next_siblings('p') returns a BeautifulSoup result set which is like a list in python. Try the following code instead.

first_paragraph = soup.find('p')
siblings = first_paragraph.find_next_siblings('p')
print "second paragraph:", siblings[0]
print "third paragraph:", siblings[1]