I have a simple program to grab the text of an article from Fox News, but for some reason I am having troubles getting the quotation marks to be decoded correctly.
from bs4 import BeautifulSoup
r = urllib.urlopen('http://www.foxnews.com/politics/2016/10/14/emails-reveal-clinton-teams-early-plan-for-handling-bill-sex-scandals.html').read()
soup = BeautifulSoup(r, 'html.parser')
for item in soup.find_all('div', class_='article-text'):
So this does not solve why Beautiful Soup was having issues decoding the text, but I have found two roundabout ways to solve the issue. One is to declare an encoding at the top of the script:
# This Python file uses the following encoding: utf-8
The other is to decode and remove all Unicode characters, then encode again with ascii.