ranthero ranthero - 9 months ago 119
HTTP Question

Python 3 html table data

I'm new to Python and I need to get the data from a table on a
Webpage and send to a list.

I've tried everything, and the best I got is:

f = urllib.request.urlopen(url)
url = "http://www2.bmf.com.br/pages/portal/bmfbovespa/lumis/lum-taxas-referenciais-bmf-enUS.asp?Data=11/22/2017&Data1=20171122&slcTaxa=APR#"
soup = BeautifulSoup(urllib.request.urlopen(url).read(),'lxml')
rows=list()
for tr in soup.findAll('table'):
rows.append(tr)


Any suggestions?

Answer Source

You're not that far !

First make sure to import the proper version of BeautifulSoup which is BeautifulSoup4 by doing apt-get install python3-bs4 (assuming you're on Ubuntu or Debian and running Python 3).

Then isolate the td elements of html table and clean data a bit. For example remove the first 3 elements of the lists which are useless, and remove the ugly '\n':

import urllib
from bs4 import BeautifulSoup
url = "http://www2.bmf.com.br/pages/portal/bmfbovespa/lumis/lum-taxas-referenciais-bmf-enUS.asp?Data=11/22/2017&Data1=20171122&slcTaxa=APR#"
soup = BeautifulSoup(urllib.request.urlopen(url).read(),'lxml')
rows=list()
for tr in soup.findAll('table'):
    for td in tr:
        rows.append(td.string)
temp_list=rows[3:]
final_list=[element for element in temp_list if element != '\n']

I don't know which data you want to extract precisely. Now you need to work on your Python list (called final_list here)!

Hope it's clear.

Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download