jcruzer jcruzer - 1 year ago 40
Python Question

Embedded For Loops while using range

I would like the following commands to grab the date from the address in this range but I can't seem to get it to run more than once. I am using Python 3. As you can see below the the url for the site is appended with i as to be read http://zinc.docking.org/substance/10 ; http://zinc.docking.org/substance/11 ... and so on. Here is the code:

import bs4 as bs
import urllib.request
site = "http://zinc.docking.org/substance/"
for i in range(10, 16):
site1 = str("%s%i" % (site, i))
sauce = urllib.request.urlopen(site1).read()
soup = bs.BeautifulSoup(sauce, 'lxml')
table1 = soup.find("table", attrs={"class": "substance-properties"})
for row in table1.findAll('tr'):
row1 = row.findAll('td')
ate = row1[0].getText()
print(ate)


This is my output:

$python3 Reset.py
November 11th, 2005


The script should however give me 3 dates. This code works so I know that row[0] does in fact contain a value.I feel like there is some sort of simple formatting error but I am not sure where to begin troubleshooting. When I format it "Correctly" this is the code:

import bs4 as bs
import urllib.request
import pandas as pd
import csv
site = "http://zinc.docking.org/substance/"
for i in range(10, 16):
site1 = str("%s%i" % (site, i))
sauce = urllib.request.urlopen(site1).read()
soup = bs.BeautifulSoup(sauce, 'lxml')
table1 = soup.find("table", attrs={"class": "substance-properties"})
table2 = soup.find("table", attrs={"class": "protomers"})
for row in table1.findAll('tr'):
row1 = row.findAll('td')
ate = row1[0].getText()
print(ate)


The error I get is as follows:

Traceback (most recent call last):
File "Reset.py", line 14, in <module>
ate = row1[0].getText()
IndexError: list index out of range


The first code works so I know that row[0] does in fact contain a value. Any ideas?

Answer Source

The problem is that when you first enter the loop you are finding all 'td' elements. The header of the table will not contain any as they are 'th' so the returned list has a length of 0 which is why you get index out of range. You also need to verify that the table does not return a NoneType (and I do not know what you are doing with table2 based on the code you put but the check should be the same):

if table1 is not None:
    for row in table1.findAll('tr'):
        row1 = row.find_all('td')
        if len(row1) != 0:
            ate = row1[0].getText()
            print(ate)
Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download