jean jean - 3 months ago 18
Python Question

Convert HTML into CSV

I want to convert a HTML table as obtained from the script below into a CSV file, but got type error as follows:


TypeError: sequence item 0: expected string, Tag found


from bs4 import BeautifulSoup
import urllib2

url = 'http://www.data.jma.go.jp/obd/stats/etrn/view/monthly_s3_en.php?block_no=47401&view=1'
html = urllib2.urlopen(url).read()
soup = BeautifulSoup(html)
table = soup.find_all('table', class_='data2_s')
rows = table[0].find_all('tr')


How is the easiest way to convert it into a CSV file?
I tried as:

fo = open('fo.txt','w')
for r in rows:
fo.write(str(r.txt) + '\n')
fo.close()


but it wrote 'none'

The HTML is like this:

<table class="data2_s"><caption class="m">WAKKANAI   WMO Station ID:47401 Lat 45<sup>o</sup>24.9'N  Lon 141<sup>o</sup>40.7'E</caption><tr><th scope="col">Year</th><th scope="col">Jan</th><th scope="col">Feb</th><th scope="col">Mar</th><th scope="col">Apr</th><th scope="col">May</th><th scope="col">Jun</th><th scope="col">Jul</th><th scope="col">Aug</th><th scope="col">Sep</th><th scope="col">Oct</th><th scope="col">Nov</th><th scope="col">Dec</th><th scope="col">Annual</th></tr><tr class="mtx" style="text-align:right;"><td style="text-align:center">1938</td><td class="data_0_0_0_0">-5.2</td><td class="data_0_0_0_0">-4.9</td><td class="data_0_0_0_0">-0.6</td><td class="data_0_0_0_0">4.7</td><td class="data_0_0_0_0">9.5</td><td class="data_0_0_0_0">11.6</td><td class="data_0_0_0_0">17.9</td><td class="data_0_0_0_0">22.2</td><td class="data_0_0_0_0">16.5</td><td class="data_0_0_0_0">10.7</td><td class="data_0_0_0_0">3.3</td><td class="data_0_0_0_0">-4.7</td><td class="data_0_0_0_0">6.8</td></tr>
<tr class="mtx" style="text-align:right;"><td style="text-align:center">1939</td><td class="data_0_0_0_0">-7.5</td><td class="data_0_0_0_0">-6.6</td><td class="data_0_0_0_0">-1.4</td><td class="data_0_0_0_0">4.0</td><td class="data_0_0_0_0">7.5</td><td class="data_0_0_0_0">13.0</td><td class="data_0_0_0_0">17.4</td><td class="data_0_0_0_0">20.0</td><td class="data_0_0_0_0">17.4</td><td class="data_0_0_0_0">9.7</td><td class="data_0_0_0_0">3.0</td><td class="data_0_0_0_0">-2.5</td><td class="data_0_0_0_0">6.2</td></tr>

Answer

Here is another way without using csv module:

fp=open('data.csv','w')
for row in rows[:-1]:   # Removed last row as it has empty cells that gives error which can also be resolved if needed 
      fp.write(row.get_text(',') + '\n')
fp.close()

You can directly open data.csv file.

Station details can be get by below command:

>>>> table = soup.find_all('table', class_='data2_s')
>>>> print table[0].find_all('caption')[0].get_text()
WAKKANAI   WMO Station ID:47401 Lat 45o24.9'N  Lon 141o40.7'E

Hope this helps.