Kunnain Kunnain - 1 month ago 12
Python Question

Web scraping ( an issue while fetching a table)

I am trying to fetch information from a table. However i am not able for some reason. I have no idea what exactly is missing in the code . The same code works (with few changes ) while fetching another table.

Here is my code.

import urllib.request
from bs4 import BeautifulSoup
import pandas as pd


htmlfile = urllib.request.urlopen("https://en.wikipedia.org/wiki/Demographics_of_Finland")
htmltext = htmlfile.read()
soup = BeautifulSoup(htmltext)

all_tables=soup.find_all('table')

right_table=soup.find('table', class_='wikitable')
right_table



#Generate lists
A=[]
B=[]
C=[]
D=[]
E=[]
F=[]
G=[]
H=[]
I=[]
for row in right_table.findAll("tr"):
cells = row.findAll('td')
states=row.findAll('th') #To store second column data
if len(cells)==8: #Only extract table body not heading
A.append(cells[0].find(text=True))
B.append(states[0].find(text=True))
C.append(cells[1].find(text=True))
D.append(cells[2].find(text=True))
E.append(cells[3].find(text=True))
F.append(cells[4].find(text=True))
G.append(cells[5].find(text=True))
H.append(cells[6].find(text=True))
I.append(cells[7].find(text=True))



#import pandas to convert list to data frame

df=pd.DataFrame(A,columns=['Number'])
df['AverageĀ“_population_(x 1000)']=B
df['Live_births']=C
df['Deaths']=D
df['Natural_change']=E
df['Crude_birth_rate_(per 1000)']=F
df['>Crude_death_rate_(per_1000)']=G
df['>Natural_change_(per 1000)']=H
df['>Total_fertility_rate']=I
df

Answer

I think you need read_html - function return list of Dataframes of html tables from url. You need third df, so [2]. Also first row of table is header - add parameter header=0 and first column can be parsed to index by parameter index_col=0:

import pandas as pd

df = pd.read_html('https://en.wikipedia.org/wiki/Demographics_of_Finland', 
                  header=0, 
                  index_col=0)[2]
print (df)
     Average population (x 1000) Live births  Deaths Natural change  \
1900                       2 646      86 339  57 915         28 424   
1901                       2 667      88 637  56 225         32 412   
1902                       2 686      87 082  50 999         36 083   
1903                       2 706      85 120  49 992         35 128   
1904                       2 735      90 253  50 227         40 026   
1905                       2 762      87 841  52 773         35 068   
1906                       2 788      91 401  50 857         40 544   
1907                       2 821      92 457  53 028         39 429   
1908                       2 861      92 146  55 305         36 841   
1909                       2 899      95 005  50 577         44 428   
1910                       2 929      92 984  51 007         41 977   
1911                       2 962      91 238  51 648         39 590   
1912                       2 998      92 275  51 645         40 630   
1913                       3 026      87 250  51 876         35 374   
...
...