camomilk camomilk - 6 months ago 19
Python Question

How to pad and align unicode strings with special characters in python?

Python makes it easy to pad and align ascii strings, like so:

>>> print "%20s and stuff" % ("test")
test and stuff
>>> print "{:>20} and stuff".format("test")
test and stuff


But how can I properly pad and align unicode strings containing special characters? I've tried several methods, but none of them seem to work:

#!/usr/bin/env python
# -*- coding: utf-8 -*-

def manual(data):
for s in data:
size = len(s)
print ' ' * (20 - size) + s + " stuff"

def with_format(data):
for s in data:
print " {:>20} stuff".format(s)

def with_oldstyle(data):
for s in data:
print "%20s stuff" % (s)

if __name__ == "__main__":
data = ("xTest1x", "ツTestツ", "♠️ Test ♠️", "~Test2~")
data_utf8 = map(lambda s: s.decode("utf8"), data)

print "with_format"
with_format(data)
print "with_oldstyle"
with_oldstyle(data)
print "with_oldstyle utf8"
with_oldstyle(data_utf8)
print "manual:"
manual(data)
print "manual utf8:"
manual(data_utf8)


This gives varied output:

with_format
xTest1x stuff
ツTestツ stuff
♠️ Test ♠️ stuff
~Test2~ stuff
with_oldstyle
xTest1x stuff
ツTestツ stuff
♠️ Test ♠️ stuff
~Test2~ stuff
with_oldstyle utf8
xTest1x stuff
ツTestツ stuff
♠️ Test ♠️ stuff
~Test2~ stuff
manual:
xTest1x stuff
ツTestツ stuff
♠️ Test ♠️ stuff
~Test2~ stuff
manual utf8:
xTest1x stuff
ツTestツ stuff
♠️ Test ♠️ stuff
~Test2~ stuff


This is using Python 2.7.

Answer

There is wcwidth module available via pip.

test.py:

import wcwidth
def manual_wcwidth(data):
    for s in data:
        size = wcwidth.wcswidth(s)
        print ' ' * (20 - size) + s + " stuff"
data = (u"xTest1x", u"ツTestツ", u"♠️ Test ♠️", u"~Test2~")
manual_wcwidth(data)

In the linux console this script yields for me perfectly aligned lines:

console screenshot

However when I run the script in PyCharm the line with kana is still shifted one character left, so this seems also font and renderer dependent:

pycharm screenshot