ivila ivila - 3 months ago 24
HTML Question

BeautifulSoup select function works differently between Python3.5.2 and Python3.4.2

Problem: I have a html file, it contains some tag, and now I want to find a tag(table) with a class attribute which value is 'targets', use BeautifulSoup4.5.1, it works fine in python3.5.2(Mac Sierra), but do not work in python3.4.2(raspberry pi), I want to figure out why.

Here is the example html file(test.html):

<!DOCTYPE html>
<table class="maincontainer">
<table class="main">
<td class="embedded">
<table class="targets"></table>

and here is how I write in the python file:

with open('test.html','rt',encoding='utf-8') as f:
from bs4 import BeautifulSoup

so can anyone tell me about these following questions:

  • how select function work?

  • why this does not work in 3.4.2 but work in 3.5.2?

  • is there any answer to handle this problem?


This is because of the different modules installed in your 3.5 and 3.4 Python environments. When you don't pass a desired parser name explicitly:

soup = BeautifulSoup(str)

BeautifulSoup would pick the parser automatically choosing from one of the installed modules. If you have lxml installed, it would pick it, if not, it would pick html5lib - if it is not installed, it would pick the built-in html.parser:

If you don’t specify anything, you’ll get the best HTML parser that’s installed. Beautiful Soup ranks lxml’s parser as being the best, then html5lib’s, then Python’s built-in parser.

In other words, you should be defining the parser explicitly to avoid any future related problems. Determine which one works for your particular case and set it:

soup = BeautifulSoup(str, "html5lib")
# or soup = BeautifulSoup(str, "lxml")
# or soup = BeautifulSoup(str, "html.parser")