import pandas
import bs4
import urllib.request
from urllib.request import Request, urlopen
data_df = pandas.read_csv("tickers.csv")
print(data_df.columns[0])
req = Request("http://performance.morningstar.com/perform/Performance/stock/annual-dividends.action?&t=XSES:D05®ion=sgp&culture=en-US&cur=&ops=clear&ndec=2&y=5", headers={"User-Agent": "Mozilla/5.0"})
webpage = urlopen(req).read()
soup = bs4.BeautifulSoup(webpage, "lxml")
table = soup.find("th", {"class": "row_lbl"})
print(table.nextSibling.text)
AttributeError: 'NavigableString' object has no attribute 'text'
This is the tr element from you url:
<tr>
<th class="row_lbl">Dividend Amount</th>
<td align="right">0.56</td>
<td align="right">0.56</td>
<td align="right">0.58</td>
<td align="right">0.60</td>
<td align="right">0.60</td>
</tr>
Now here the "th" element with class : row_lbl i.e table =
<th class="row_lbl">Dividend Amount</th>
so table.text should return "Dividend Amount"
BeautifulSoup the elements not only include tags but also text/whitespaces between them, which BeautifulSoup marks as NavigableString.
In your case there are whitespaces/line breaks between the "td" tags. These are also treated as elements(Navigable string) which obviously do not have the properties of html elements.
So if your html was something like below, there would be no navigable Strings:
<th class="row_lbl">Dividend Amount</th><td align="right">0.56</td><td align="right">0.56</td><td align="right">0.58</td><td align="right">0.60</td><td align="right">0.60</td>
To Skip the navigable strings in current case:. Try:
while True:
table= table.nextSibling
if table==None:
break
print table # you will see that this printing whitespaces sometimes
try:
tag_name = table.name
except AttributeError:
tag_name = ""
if tag_name == "td":
print table.text