patrick patrick - 1 year ago 58
Python Question

Using unicode / umlauts in Python: Dictionary v manual input

I am using a dictionary to store some character pairs in Python (I am replacing umlaut characters). Here is what it looks like:

'ae': 'ä',
'ue': 'ü',
'oe': 'ö'

Then I run my inputwords through it like so:

for item in umlautdict.keys():

But this does not do anything (no replacement happens). When I printed out my umlautdict, I saw that it looks like this:

{'ue': '\xfc', 'oe': '\xf6', 'ae': '\xc3\xa4'}

Of course that is not what I want; however, trying things like
(--> Error) or pre-fixing
did not improve things.

If I type the 'ä' or 'ö' into the
command by hand, everything works just fine. I also changed the settings in my script (working in TextWrangler) to
# -*- coding: utf-8 -*-
as it would net even let me execute the script containing umlauts without it.

So I don't get...

  • Why does this happen? Why and when do the umlauts change from "good
    to evil" when I store them in the dictionary?

  • How do I fix it?

  • Also, if anyone knows: what is a good resource to learn about
    encoding in Python? I have issues all the time and so many things
    don't make sense to me / I can't wrap my head around.

I'm working on a Mac in Python 2.7.10. Thanks for your help!

Answer Source
  1. Declare your coding.
  2. Use raw format for the special characters.
  3. Iterate properly on your string: keep the changes from each loop iteration as you head to the next.

Here's code to get the job done:

\# -*- coding: utf-8 -*-

umlautdict = {
    'ae': r'ä',
    'ue': r'ü',
    'oe': r'ö'

print umlautdict

inputword = "haer ueber loess"
for item in umlautdict.keys():
        inputword = inputword.replace(item, umlautdict[item])

print inputword


{'ue': '\xc3\xbc', 'oe': '\xc3\xb6', 'ae': '\xc3\xa4'}
här über löss