McLeodx McLeodx - 5 months ago 13
Python Question

How am I getting two different results from same Python print command?

For the first

print tag
I am getting a large list of hundreds of
<a
tags. For the second
print tag
I am getting a list with four
<a
tags, not including the ones that I want.

One of the tags that tags that I want is at the end of
tags
. After printing all several hundred tags, I am printing the last tag, and that is printing the correct end tag as it should. But then by running another for loop over the same (unchanged) list
tags
I am not just getting a different result, but significantly different.

With or without the `print '\n\n\n' the phenomenon is happening, it's just to make the split between the two prints easier for me to see.

What is happening to this list in between the first and second
for
loop to cause this problem?

(This code is exactly as I have it in my script. Originally I didn't have the lines from the first
for
loop until the empty line, and am doing this to debug the lack of the correct URL from the end result.)

EDIT: Also, here is what is being printed for all the
print
statements (only the last section of the first
print
within the
for
loop):

import urllib
from bs4 import BeautifulSoup

startingList = ['http://www.stowefamilylaw.co.uk/']
for url in startingList:
try:
html = urllib.urlopen(url)
soup = BeautifulSoup(html,'lxml')
tags = soup('a')
for tag in tags:
print tag
print tags[-1]
print '\n\n\n'

for tag in tags:
print tag
if not tag.get('href', None).startswith('..'):
continue
except:
continue

....

<a class="shiftnav-target" href="http://www.stowefamilylaw.co.uk/faq-category/decrees-orders-forms/" itemprop="url">Decrees, Orders &amp; Forms</a>
<a class="shiftnav-target" href="http://www.stowefamilylaw.co.uk/faq-category/international-divorce/" itemprop="url">International Divorce</a>
<a class="shiftnav-target"><i class="fa fa-chevron-left"></i> Back</a>
<a class="shiftnav-target" href="http://www.stowefamilylaw.co.uk/contact/" itemprop="url"><i class="fa fa-phone"></i> Contact</a>
<a class="shiftnav-target" href="http://www.stowefamilylaw.co.uk/contact/" itemprop="url"><i class="fa fa-phone"></i> Contact</a>




<a href="http://www.stowefamilylaw.co.uk/">Stowe Family Law</a>
<a href="#spu-5086" style="color: #fff"><div class="callbackbutton"><i class="fa fa-phone" style="font-size: 16px"></i> Request Callback </div></a>
<a href="#spu-5084" style="color: #fff"><div class="callbackbutton"><i class="fa fa-envelope-o" style="font-size: 16px"></i> Quick Enquiry </div></a>
<a class="ubermenu-responsive-toggle ubermenu-responsive-toggle-main ubermenu-skin-black-white-2 ubermenu-loc-primary" data-ubermenu-target="ubermenu-main-3-primary"><i class="fa fa-bars"></i>Main Menu</a>

Answer

You have a blanket except::

try:
    # ...
except:
    continue

so any error in the block will be masked and your loop will be skipped. Don't use blanket except handlers without raising again, ever, see Why is "except: pass" a bad programming practice?. At the very least catch only Exception and print that error:

except Exception as e:
    print 'Encountered:', e

Without proper diagnostics all we can do is guess.

One error you definitely have is an attribute error here when there is no href attribute; the None object doesn't have an attribute startswith:

if not tag.get('href', None).startswith('..'):

Instead of None return an empty string:

if not tag.get('href', '').startswith('..'):

or better yet, select only a tags with an href attribute:

tags = soup.select('a[href]')