Suhail Suhail - 4 years ago 422
Python Question

Failing to write output of HtmlDiff containing unicode text in python 3

I am trying to compare two Arabic strings using python's

module. I have looked at various ways of writing Unicode texts to file in python but none seems to work for me. I have tried so far:

Note: in all subsequent code snippets,
original
and
mockinputs
are lists of strings, as required by
HtmlDiff
, of Unicode text, specifically Arabic.

Method 1


import difflib

hdiff = difflib.HtmlDiff()
html = hdiff.make_file(original, mockinputs)

with open('out_file.html', 'w', encoding='utf-8') as out_file:
out_file.write(html)


This runs without error but the html file created is gibberish (things like
الرحÙ
) when opened in browser.

Method 2


import difflib

htmldiff = difflib.HtmlDiff()
html = htmldiff.make_file(original, mockinputs)

out_file = open('out_file.html', 'w', encoding='utf-8')
out_file.write(html)
out_file.close()


This too runs without error and also the html file is gebberish

Method 3
(as pointed out here)

import difflib

htmldiff = difflib.HtmlDiff()
html = htmldiff.make_file(original, mockinputs)

out_file = open('out_file.html', 'w')
out_file.write(html.encode('utf-8'))
out_file.close()


This gives me this error:


TypeError: must be str, not bytes


So, how can I write Unicode texts produced as shown here to an html file in python 3?

Answer Source

According to the documentation, the make_file method in versions of Python before Python3.5 defaulted to a charset of ISO-8859-1, which would not include Arabic.

Further, most browsers are going to see ISO-8859-1 and fallback to ASCII. Thus, you have to use that method in Python3.5 in order to get utf-8 or generate the HTML output that you would like in a different way.

Edit: as of python 3.5.1, though the make_html method uses default charset utf-8, its brother method make_table doesn't, so take care using the latter!

Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download