I'm new to Python and I need to get the data from a table on a
Webpage and send to a list.
I've tried everything, and the best I got is:
f = urllib.request.urlopen(url)
url = "http://www2.bmf.com.br/pages/portal/bmfbovespa/lumis/lum-taxas-referenciais-bmf-enUS.asp?Data=11/22/2017&Data1=20171122&slcTaxa=APR#"
soup = BeautifulSoup(urllib.request.urlopen(url).read(),'lxml')
rows=list()
for tr in soup.findAll('table'):
rows.append(tr)
You're not that far !
First make sure to import the proper version of BeautifulSoup which is BeautifulSoup4 by doing apt-get install python3-bs4
(assuming you're on Ubuntu or Debian and running Python 3).
Then isolate the td
elements of html table
and clean data a bit. For example remove the first 3 elements of the lists which are useless, and remove the ugly '\n':
import urllib
from bs4 import BeautifulSoup
url = "http://www2.bmf.com.br/pages/portal/bmfbovespa/lumis/lum-taxas-referenciais-bmf-enUS.asp?Data=11/22/2017&Data1=20171122&slcTaxa=APR#"
soup = BeautifulSoup(urllib.request.urlopen(url).read(),'lxml')
rows=list()
for tr in soup.findAll('table'):
for td in tr:
rows.append(td.string)
temp_list=rows[3:]
final_list=[element for element in temp_list if element != '\n']
I don't know which data you want to extract precisely. Now you need to work on your Python list (called final_list
here)!
Hope it's clear.