Kim Hyesung Kim Hyesung - 3 months ago 26
HTML Question

python BeautifulSoup How to write the output to html file

I modify the html file by removing some of the tag using beautifulsoup, then I want to write the results back in a html file.
my code:

from bs4 import BeautifulSoup
from bs4 import Comment

soup = BeautifulSoup(open('1.html'),"html.parser")

[x.extract() for x in soup.find_all('script')]
[x.extract() for x in soup.find_all('style')]
[x.extract() for x in soup.find_all('meta')]
[x.extract() for x in soup.find_all('noscript')]
[x.extract() for x in soup.find_all(text=lambda text:isinstance(text, Comment))]
html =soup.contents
for i in html:
print i

html = soup.prettify("utf-8")
with open("output1.html", "wb") as file:

but since I use soup.prettify, it generates html like this

- Tradisi pedang pora mewarnai serah terima jabatan pejabat di
<a href="" title="Polres">
<a href="" title="Bintan">
, Senin (3/10/2016).

But i have to get the result like the print i do. like this :

<p><strong>BATAM.TRIBUNNEWS.COM, BINTAN</strong> - Tradisi pedang pora mewarnai serah terima jabatan pejabat di <a href="" title="Polres">Polres</a> <a href="" title="Bintan">Bintan</a>, Senin (3/10/2016).</p>
<p>Empat perwira baru Senin itu diminta cepat bekerja. Tumpukan pekerjaan rumah sudah menanti di meja masing masing.</p>

so how to make the result is exactly the same as print i. so the tag and its content will be on same line. Thanks


Just convert the soup instance to string and write:

with open("output1.html", "w") as file: