SamH SamH - 3 years ago 231
Python Question

Dumping unicode with YAML

I'm creating yaml files from csv's that have a lot of unicode characters in them but I can't seem to get it to dump the unicode without it giving me a Decode Error.

I'm using the

ruamel.yaml
library.

UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 in position 11: ordinal not in range(128)


I've tried parsing strings, unicode strings, encoding with "utf-8" nothing seems to work. I've seen a lot of examples that show adding a representer to solve the issue but they all seem to be using the old method for ruamel and I can't seem to find out how to do that in the newer method documented anywhere.

from ruamel.yaml import YAML

class YamlObject(YAML):
def __init__(self):
YAML.__init__(self)
self.default_flow_style = False
self.block_seq_indent = 2
self.indent = 4
self.allow_unicode = True

textDict = {"text": u"HELLO_WORLD©"}
textFile = "D:\\testFile.yml"
yaml = YamlObject()
yaml.dump(textDict, file(textFile, "w"))


I can unicode the entire dict and that works but it doesn't give me the format I need back.

What I need is just:

text: HELLO_WORLD©


How can I do that?

Answer Source

You're missing encoding in your derived YAML object.

Try like this:

class YamlObject(YAML):
    def __init__(self):
        YAML.__init__(self)
        self.default_flow_style = False
        self.block_seq_indent = 2
        self.indent = 4
        self.allow_unicode = True
        self.encoding = 'utf-8'

If you look at the definition of your base class, YAML, you'll notice that by default, encoding is undefined:

self.encoding = None

but that's not a problem usually because allow_unicode also defaults to None in the dump() method.

Since you're overriding allow_unicode, you should also set the encoding.

Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download