user1497050 user1497050 - 1 year ago 86
Python Question

Extract image links from the webpage using Python

So I wanted to get all of the pictures on this page(of the nba teams).

However, my code gives a bit more than that. It gives me,

<a href="/nba/teams/page/ORL"><img src="" alt="Orlando Magic" width="30" height="30" border="0" /></a>

How can I shorten it to only give me,

My code:

import urllib2
from BeautifulSoup import BeautifulSoup
# or if your're using BeautifulSoup4:
# from bs4 import BeautifulSoup

soup = BeautifulSoup(urllib2.urlopen('').read())

rows = soup.findAll("table", attrs = {'class': 'data borderTop'})[0].tbody.findAll("tr")[2:]

for row in rows:
fields = row.findAll("td")
if len(fields) >= 3:
anchor = row.findAll("td")[1].find("a")
if anchor:
print anchor

Answer Source

I know this can be "traumatic", but for those automatically generated pages, where you just want to grab the damn images away and never come back, a quick-n-dirty regular expression that takes the desired pattern tends to be my choice (no Beautiful Soup dependency is a great advantage):

import urllib, re

source = urllib.urlopen('').read()

## every image name is an abbreviation composed by capital letters, so...
for link in re.findall('[A-Z]*.png', source):
    print link

    ## the code above just prints the link;
    ## if you want to actually download, set the flag below to True

    actually_download = False
    if actually_download:
        filename = link.split('/')[-1]
        urllib.urlretrieve(link, filename)

Hope this helps!

Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download