Jacs Jacs - 1 year ago 94
Python Question

Python BeautifulSoup does not print()

Following beautifulsoup script shows no output. Did i miss anything?
It was intended to hit some of the prints.

from urllib.request import urlopen
from urllib.error import HTTPError
from bs4 import BeautifulSoup
import sys

url1 = "https://www.youtube.com/watch?v=APmUWC8S1_M"

def getTitle(url):
html = urlopen(url)
except HTTPError as e:
return None
bsObj = BeautifulSoup(html.read())
except AttributeError as e:
return None
return bsObj

title = getTitle(url1)

if title == None:
print("None at URL: " + url1)

Answer Source

For BeautifulSoup4, I would reccommend using the requests module (obtained via pip), for getting the website data.

To get the html of the desired site, use

content = requests.get(url).content

That will save the entire html doc to the variable "content".

From that, you can get use the following script to print out any data you need.

Note: lxml (the html parser that is good for bs4) has problems when installing in python 3, so 2.7 is the best version for this.

import requests
from bs4 import BeautifulSoup as bs

def getTitle(url):
    content = requests.get(url).content
    page = bs(content, "lxml")
    title = page.title.string
    return title

url1 = "https://www.youtube.com/watch?v=APmUWC8S1_M"
t = getTitle(url1)

if t == None:
    print "None at url " + url1
    print t

I tested this on my local machine (Win 10, Python 2.7.12, requests, beautifulsoup4, and lxml installed via pip) and it worked perfectly.

If you want more information on requests, you can look here, and more info for BeautifulSoup can be found here.

Hope that this has helped you.

Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download