Kim Hyesung Kim Hyesung - 1 year ago 252
HTML Question

python BeautifulSoup How to write the output to html file

I modify the html file by removing some of the tag using beautifulsoup, then I want to write the results back in a html file.
my code:

from bs4 import BeautifulSoup
from bs4 import Comment

soup = BeautifulSoup(open('1.html'),"html.parser")

[x.extract() for x in soup.find_all('script')]
[x.extract() for x in soup.find_all('style')]
[x.extract() for x in soup.find_all('meta')]
[x.extract() for x in soup.find_all('noscript')]
[x.extract() for x in soup.find_all(text=lambda text:isinstance(text, Comment))]
html =soup.contents
for i in html:
print i

html = soup.prettify("utf-8")
with open("output1.html", "wb") as file:

but since I use soup.prettify, it generates html like this

- Tradisi pedang pora mewarnai serah terima jabatan pejabat di
<a href="" title="Polres">
<a href="" title="Bintan">
, Senin (3/10/2016).

But i have to get the result like the print i do. like this :

<p><strong>BATAM.TRIBUNNEWS.COM, BINTAN</strong> - Tradisi pedang pora mewarnai serah terima jabatan pejabat di <a href="" title="Polres">Polres</a> <a href="" title="Bintan">Bintan</a>, Senin (3/10/2016).</p>
<p>Empat perwira baru Senin itu diminta cepat bekerja. Tumpukan pekerjaan rumah sudah menanti di meja masing masing.</p>

so how to make the result is exactly the same as print i. so the tag and its content will be on same line. Thanks

Answer Source

Just convert the soup instance to string and write:

with open("output1.html", "w") as file:
Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download