Rafael Colucci Rafael Colucci - 11 days ago 7
YAML Question

Yaml load converting string to UTF8?

I have this YAML:

---
test: {"gender":0,"nacionality":"Alem\u00e3o"}


I am reading it using python 3.5 as follow:

with open('teste.yaml', 'r') as stream:
doc = yaml.load_all(stream)
for line in doc:
print(line)


This is the result I get:

{'test': {'gender': 0, 'nacionality': 'Alemão'}}


But If I change
"
for
'
in my YAML, I get this:

{'test': {'nacionality': 'Alem\\u00e3o', 'gender': 0}}


As you can see, when I use
"
the string
Alem\\u00e3o
is converted to UTF, but with
'
it does not.

So I have two questions:

Why do I get different outputs when I use
'
and
"
?

What can I do to get the output as
Alem\\u00e3o
when using
"
?

Answer

That's how the YAML data format is defined. Within double quotes, specific escape sequences are interpreted. Within single quotes, they're not.

7.3.1. Double-Quoted Style

The double-quoted style is specified by surrounding “"” indicators. This is the only style capable of expressing arbitrary strings, by using “\” escape sequences. This comes at the cost of having to escape the “\” and “"” characters.

http://yaml.org/spec/1.2/spec.html#id2787109


What can I do to get the output as Alem\u00e3o when using "?

Escape the escape character:

test: {"gender":0,"nacionality":"Alem\\u00e3o"}