Following beautifulsoup script shows no output. Did i miss anything?
It was intended to hit some of the prints.
from urllib.request import urlopen
from urllib.error import HTTPError
from bs4 import BeautifulSoup
url1 = "https://www.youtube.com/watch?v=APmUWC8S1_M"
html = urlopen(url)
except HTTPError as e:
bsObj = BeautifulSoup(html.read())
except AttributeError as e:
title = getTitle(url1)
if title == None:
print("None at URL: " + url1)
For BeautifulSoup4, I would reccommend using the requests module (obtained via pip), for getting the website data.
To get the html of the desired site, use
content = requests.get(url).content
That will save the entire html doc to the variable "content".
From that, you can get use the following script to print out any data you need.
Note: lxml (the html parser that is good for bs4) has problems when installing in python 3, so 2.7 is the best version for this.
import requests from bs4 import BeautifulSoup as bs def getTitle(url): content = requests.get(url).content page = bs(content, "lxml") title = page.title.string return title url1 = "https://www.youtube.com/watch?v=APmUWC8S1_M" t = getTitle(url1) if t == None: print "None at url " + url1 else: print t
I tested this on my local machine (Win 10, Python 2.7.12, requests, beautifulsoup4, and lxml installed via pip) and it worked perfectly.
Hope that this has helped you.