displayname displayname -4 years ago 75
Python Question

Compare german umlaut in Python

I have a list of german words and I want to eliminate all nouns therefore I look after the first letter being uppercase or lowercase. This works for all words except for the words that begin with an umlaut e.g.

"Äpfel"
.

# -*- coding: utf-8 -*-
dictionary = open('dictionary/de.dict', 'r')

for line in dictionary:
if line[0] == "Ä": # This does not work
print "Ä found"


How can I make this work?

Answer Source

The utf8-encoded string "Ä" consists of two characters:

>>> "Ä"
'\xc3\x84'

The unicode string u"Ä" is only one. You have to encode the strings correctly. So if your dictionary is encoded in utf-8 use:

import io
dictionary = io.open('dictionary/de.dict', encoding='utf8')
for line in dictionary:
    if line[0].isupper():
        print "Uppercase word", line
Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download