Albert Albert - 1 month ago 9
HTML Question

How to get the url and the title from the <a> tags with beautifulSoup

I'm coding a script to get all the links from the divs with a class="pntc-txt" and after I want to get from the tags the href attribute and also the text between the Something. For after take that url and text and insert them in a database. I'll post the code that I've done so far:

import urllib.request
from bs4 import *

sock = urllib.request.urlopen("http://as.com/tag/moto_gp/a/")
htmlSource = sock.read()
sock.close()

soup = BeautifulSoup(htmlSource)


for div in soup.findAll('div', {'class': 'pntc-txt'}):
a = div.findAll('a')
print (a)

Answer

Try this:

import requests
from bs4 import *

srcCode = requests.get("http://as.com/tag/moto_gp/a/")
plainText = srcCode.text

soup = BeautifulSoup(plainText)


for div in soup.findAll('div', {'class': 'pntc-txt'}):
    for each in div.findAll('a'):      #get all elements with 'a' tag
        href = each.get('href')
        print href          #print href
        print each.string   #print the text in tags
        print each          #print whole tag

Note: also removed the urllib part to read the html page. Instead used package requests

Comments