Alex Alex - 8 days ago 6
Python Question

Locating and Printing HTML Hyperlink name

I want to be able to grab 'link to google' and print it from this:

<a href= "http://www.google.com">link to google</a>


This bottom code is able to grab the link, but I'm not sure how to make it grab the normal text.

def handle_starttag(self, tag, attrs):

if tag == 'a':
self.anchor = True
if self.anchor == True:
for attr in attrs:
if attr[0] == 'href':
print(attr[1])

Answer

Use it with handle_data:

def handle_starttag(self, tag, attrs):
    if tag == 'a':
        self.anchor = True

def handle_data(self, data):
    if self.anchor:
        print('anchor data is:', data)
    self.anchor = False

That will go into an a tag and set self.anchor true, then stumble upon data (within the tag) and if the last tag was a will print the data. Anyways, after that round, the self.anchor will be false again, preventing multiple false anchors detection.

Comments