DineshMurani DineshMurani - 1 month ago 20
HTML Question

retrieve links from web page using python and BeautifulSoup than select 3 link and run it 4 times

Here is code.

import urllib
from BeautifulSoup import *

url = raw_input('Enter - ')
html = urllib.urlopen(url).read()
soup = BeautifulSoup(html)

# Retrieve all of the anchor tags

tags = soup('a')
for tag in tags:
print tag.get('href', None)


There are 18 links. Now need to get position 3 means third link from the output and provide that link as input to html and run it again and do it 4 times. and what ever the last output at the position 3 than print out the name.

[https://pr4e.dr-chuck.com/tsugi/mod/python-data/data/known_by_Fikret.html][1]

which will return 18 links from the above HTML. Now we need to select the 3 link and provide that 3rd link as an input to 'url' and follow the above loop for 4 times and what ever the last links comes out than get a name like in the first link 'fikret' is the name and what ever in the last link that is our output. Hope this helps. Thank you for looking into it.

Answer

I was able to accomplish your homework in the following way (please take the time to learn this):

import urllib
from bs4 import BeautifulSoup

# This function will get the Nth link object from the given url.
# To be safe you should make sure the nth link exists (I did not)
def getNthLink(url, n):
    html = urllib.urlopen(url).read()
    soup = BeautifulSoup(html, 'html.parser')
    tags = soup('a')
    return tags[n-1]

url = "https://pr4e.dr-chuck.com/tsugi/mod/python-data/data/known_by_Fikret.html"

# This iterates 4 times, each time grabbing the 3rd link object
# For convenience it prints the url each time.
for i in xrange(4):
    tag = getNthLink(url,3)
    url = tag.get('href')
    print url

# Finally after 4 times we grab the content from the last tag
print tag.contents[0]