ivila ivila - 11 days ago 5
HTML Question

BeautifulSoup select function works differently between Python3.5.2 and Python3.4.2

Problem: I have a html file, it contains some tag, and now I want to find a tag(table) with a class attribute which value is 'targets', use BeautifulSoup4.5.1, it works fine in python3.5.2(Mac Sierra), but do not work in python3.4.2(raspberry pi), I want to figure out why.

Here is the example html file(test.html):

<!DOCTYPE html>
<html>
<head>
<title>test</title>
</head>
<body>
<table class="maincontainer">
<tbody>
<tr>中文</tr>
<tr>
<td>
<table class="main">
<tbody>
<tr>
<td class="embedded">
<td></td>
<table class="targets"></table>
</td>
</tr>
</tbody>
</table>
</td>
</tr>
</tbody>
</table>
</body>
</html>


and here is how I write in the python file:

str=''
with open('test.html','rt',encoding='utf-8') as f:
str=f.read()
from bs4 import BeautifulSoup
soup=BeautifulSoup(str)
table=soup.select('table[class="targets"]')


so can anyone tell me about these following questions:


  • how select function work?

  • why this does not work in 3.4.2 but work in 3.5.2?

  • is there any answer to handle this problem?


Answer

This is because of the different modules installed in your 3.5 and 3.4 Python environments. When you don't pass a desired parser name explicitly:

soup = BeautifulSoup(str)

BeautifulSoup would pick the parser automatically choosing from one of the installed modules. If you have lxml installed, it would pick it, if not, it would pick html5lib - if it is not installed, it would pick the built-in html.parser:

If you don’t specify anything, you’ll get the best HTML parser that’s installed. Beautiful Soup ranks lxml’s parser as being the best, then html5lib’s, then Python’s built-in parser.

In other words, you should be defining the parser explicitly to avoid any future related problems. Determine which one works for your particular case and set it:

soup = BeautifulSoup(str, "html5lib")
# or soup = BeautifulSoup(str, "lxml")
# or soup = BeautifulSoup(str, "html.parser")
Comments