Arqam Arqam - 2 years ago 58
Python Question

How to extract text using BeautifulSoup when same tag exist which are not useful

I am doing a little bit of web scrapping and I need the text in between the paragraphs


<small>147 out of 252 people found the following review useful:</small><br>
<a href="/user/ur0935867/"><img class="avatar" src="" height=${avatar.image.size} width=${avatar.image.size}></a>
<h2>Unbelievable and way overrated</h2>
<img width="102" height="12" alt="3/10" src=""><br>
<a href="/user/ur0935867/">glenmoreland</a> <small>from Holland</small><br>
<small>18 January 2016</small><br>
<p><b>*** This review may contain spoilers ***</b></p>

I cannot believe how many people think this is a good movie....watching
a guy struggle to survive for 2 hours ...come on people..I know there
are not many good movies being made but my many things are
unbelievable...the bear attack, carrying a near dead guy out of the
wilderness up a mountain...going over a cliff on a horse and not
getting hurt...spending long periods of time in freezing cold
water.....surviving extreme cold overnight inside a dead god
the list is endless....and for Leo&#x27;s so called acting don&#x27;t get me
started...a lot of crawling and moaning and groaning....the whole thing
was a letdown and really a waste of time...also tell the director to
back the camera up a bit on those facial close-ups...they were also me save your money and go see The Hateful Eight.

<div class="yn" id="ynd_3398112">

<form method="get"


Was the above review useful to you?

I just need the review in between the the
tag. And in the source code of the page there are many
tag which does not contain reviews. How can I get the text of the reviews using BeautifulSoup?

Ps : Source code from

Answer Source

Have you tried something like this?

from BeautifulSoup import BeautifulSoup

soup = BeautifulSoup(html)

for tag in soup.findAll('p'):
        soup_tag = BeautifulSoup(str(tag))
        b_tag = soup_tag.findAll('b')
        if len(b_tag) == 0:
            review = tag

print review

or, even better, you could try find_previous_sibling('p') or using that <div class="yn" id="ynd_3398112"> tag. i noticed that the review is not inside that <div> tag, so you could use this info to access the data you're looking for. sorry, but your question is not clear.

Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download