jean jean - 3 months ago 5
Python Question

Unable to fetch Table from BeautifulSoup

from BeautifulSoup import BeautifulSoup
import urllib2

url = 'http://www.data.jma.go.jp/obd/stats/etrn/view/monthly_s3_en.php?block_no=47401&view=1'
html = urllib2.urlopen(url).read()
soup = BeautifulSoup(html)
table = soup.find('table')
print table


Expected table is not resulted.

I want to grab the table below:

enter image description here

Answer

First off, use bs4 beaufifulsoup3 is no longer maintained, also the table you want has the class *data2_s*, calling find("table") just gets the first table on the page which is not what you want:

from bs4 import BeautifulSoup
import urllib2

url = 'http://www.data.jma.go.jp/obd/stats/etrn/view/monthly_s3_en.php?block_no=47401&view=1'
html = urllib2.urlopen(url).read()
soup = BeautifulSoup(html)
table = soup.select_one("table.data2_s") # or table = soup.find("table", class_="data2_s")
print table

Which gives you:

<table class="data2_s"><caption class="m">WAKKANAI   WMO Station ID:47401 Lat 45<sup>o</sup>24.9'N  Lon 141<sup>o</sup>40.7'E</caption><tr><th scope="col">Year</th><th scope="col">Jan</th><th scope="col">Feb</th><th scope="col">Mar</th><th scope="col">Apr</th><th scope="col">May</th><th scope="col">Jun</th><th scope="col">Jul</th><th scope="col">Aug</th><th scope="col">Sep</th><th scope="col">Oct</th><th scope="col">Nov</th><th scope="col">Dec</th><th scope="col">Annual</th></tr><tr class="mtx" style="text-align:right;"><td style="text-align:center">1938</td><td class="data_0_0_0_0">-5.2</td><td class="data_0_0_0_0">-4.9</td><td class="data_0_0_0_0">-0.6</td><td class="data_0_0_0_0">4.7</td><td class="data_0_0_0_0">9.5</td><td class="data_0_0_0_0">11.6</td><td class="data_0_0_0_0">17.9</td><td class="data_0_0_0_0">22.2</td><td class="data_0_0_0_0">16.5</td><td class="data_0_0_0_0">10.7</td><td class="data_0_0_0_0">3.3</td><td class="data_0_0_0_0">-4.7</td><td class="data_0_0_0_0">6.8</td></tr>
<tr class="mtx" style="text-align:right;"><td style="text-align:center">1939</td><td class="data_0_0_0_0">-7.5</td><td class="data_0_0_0_0">-6.6</td><td class="data_0_0_0_0">-1.4</td><td class="data_0_0_0_0">4.0</td><td class="data_0_0_0_0">7.5</td><td class="data_0_0_0_0">13.0</td><td class="data_0_0_0_0">17.4</td><td class="data_0_0_0_0">20.0</td><td class="data_0_0_0_0">17.4</td><td class="data_0_0_0_0">9.7</td><td class="data_0_0_0_0">3.0</td><td class="data_0_0_0_0">-2.5</td><td class="data_0_0_0_0">6.2</td></tr>
<tr class="mtx" style="text-align:right;"><td style="text-align:center">1940</td><td class="data_0_0_0_0">-6.0</td><td class="data_0_0_0_0">-5.7</td><td class="data_0_0_0_0">-0.5</td><td class="data_0_0_0_0">3.5</td><td class="data_0_0_0_0">8.5</td><td class="data_0_0_0_0">11.0</td><td class="data_0_0_0_0">16.6</td><td class="data_0_0_0_0">19.7</td><td class="data_0_0_0_0">15.6</td><td class="data_0_0_0_0">10.4</td><td class="data_0_0_0_0">3.7</td><td class="data_0_0_0_0">-1.0</td><td class="data_0_0_0_0">6.3</td></tr>
<tr class="mtx" style="text-align:right;"><td style="text-align:center">1941</td><td class="data_0_0_0_0">-6.5</td><td class="data_0_0_0_0">-5.8</td><td class="data_0_0_0_0">-2.6</td><td class="data_0_0_0_0">3.6</td><td class="data_0_0_0_0">8.1</td><td class="data_0_0_0_0">11.4</td><td class="data_0_0_0_0">12.7</td><td class="data_0_0_0_0">16.5</td><td class="data_0_0_0_0">16.0</td><td class="data_0_0_0_0">10.0</td><td class="data_0_0_0_0">4.0</td><td class="data_0_0_0_0">-2.9</td><td class="data_0_0_0_0">5.4</td></tr>
<tr class="mtx" style="text-align:right;"><td style="text-align:center">1942</td><td class="data_0_0_0_0">-7.8</td><td class="data_0_0_0_0">-8.2</td><td class="data_0_0_0_0">-0.8</td><td class="data_0_0_0_0">3.5</td><td class="data_0_0_0_0">7.1</td><td class="data_0_0_0_0">12.0</td><td class="data_0_0_0_0">17.4</td><td class="data_0_0_0_0">18.4</td><td class="data_0_0_0_0">15.7</td><td class="data_0_0_0_0">10.5</td><td class="data_0_0_0_0">2.5</td><td class="data_0_0_0_0">-2.9</td><td class="data_0_0_0_0">5.6</td></tr>
etc...................................