OOI YI YONG OOI YI YONG - 4 years ago 86
Python Question

Python BeautifulSoup (need Guidance not answer only)

import pandas
import bs4
import urllib.request
from urllib.request import Request, urlopen

data_df = pandas.read_csv("tickers.csv")

print(data_df.columns[0])

req = Request("http://performance.morningstar.com/perform/Performance/stock/annual-dividends.action?&t=XSES:D05&region=sgp&culture=en-US&cur=&ops=clear&ndec=2&y=5", headers={"User-Agent": "Mozilla/5.0"})
webpage = urlopen(req).read()

soup = bs4.BeautifulSoup(webpage, "lxml")

table = soup.find("th", {"class": "row_lbl"})

print(table.nextSibling.text)


if i did print(table.text), my output is "Dividend Amount" which is correct.

can anyone explain why does .nextSibling is not working here? i need explanation and not a straight out answer, i am new to python & coding and i want to learn.

the error for .nextSibling.text is below

AttributeError: 'NavigableString' object has no attribute 'text'

Answer Source

This is the tr element from you url:

<tr> 
                <th class="row_lbl">Dividend Amount</th>
                <td align="right">0.56</td>
                <td align="right">0.56</td>
                <td align="right">0.58</td>
                <td align="right">0.60</td>
                <td align="right">0.60</td>
            </tr> 

Now here the "th" element with class : row_lbl i.e table =

<th class="row_lbl">Dividend Amount</th>

so table.text should return "Dividend Amount"

BeautifulSoup the elements not only include tags but also text/whitespaces between them, which BeautifulSoup marks as NavigableString.

In your case there are whitespaces/line breaks between the "td" tags. These are also treated as elements(Navigable string) which obviously do not have the properties of html elements.

So if your html was something like below, there would be no navigable Strings:

 <th class="row_lbl">Dividend Amount</th><td align="right">0.56</td><td align="right">0.56</td><td align="right">0.58</td><td align="right">0.60</td><td align="right">0.60</td>

To Skip the navigable strings in current case:. Try:

while True:
        table= table.nextSibling
        if table==None:
            break
        print table   # you will see that this printing whitespaces sometimes
        try:
            tag_name = table.name
        except AttributeError:
            tag_name = ""
        if tag_name == "td":
            print table.text
Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download