view raw
DineshMurani DineshMurani - 6 months ago 38
HTML Question

retrieve links from web page using python and BeautifulSoup than select 3 link and run it 4 times

Here is code.

import urllib
from BeautifulSoup import *

url = raw_input('Enter - ')
html = urllib.urlopen(url).read()
soup = BeautifulSoup(html)

# Retrieve all of the anchor tags

tags = soup('a')
for tag in tags:
print tag.get('href', None)

There are 18 links. Now need to get position 3 means third link from the output and provide that link as input to html and run it again and do it 4 times. and what ever the last output at the position 3 than print out the name.


which will return 18 links from the above HTML. Now we need to select the 3 link and provide that 3rd link as an input to 'url' and follow the above loop for 4 times and what ever the last links comes out than get a name like in the first link 'fikret' is the name and what ever in the last link that is our output. Hope this helps. Thank you for looking into it.


I was able to accomplish your homework in the following way (please take the time to learn this):

import urllib
from bs4 import BeautifulSoup

# This function will get the Nth link object from the given url.
# To be safe you should make sure the nth link exists (I did not)
def getNthLink(url, n):
    html = urllib.urlopen(url).read()
    soup = BeautifulSoup(html, 'html.parser')
    tags = soup('a')
    return tags[n-1]

url = ""

# This iterates 4 times, each time grabbing the 3rd link object
# For convenience it prints the url each time.
for i in xrange(4):
    tag = getNthLink(url,3)
    url = tag.get('href')
    print url

# Finally after 4 times we grab the content from the last tag
print tag.contents[0]