Cody Reandeau
Python Question

Extracting raw html from locally saved html file using BeautifulSoup

Relativally new to BeautifulSoup. Attempting to obtain raw html from locally saved html file. I've looked around and have found that I should probably be using Beautiful Soup for this. Though when I do this:

from bs4 import BeautifulSoup
url = r"C:\example.html"
soup = BeautifulSoup(url, "html.parser")
text = soup.get_text()
print (text)

An empty string is printed out. I assume I'm missing some step. Any nudge in the right direction would be greatly appreciated.

Answer Source

The first argument to BeautifulSoup is an actual HTML string, not a URL. Open the file, read its contents, and pass that in.

