iCoder iCoder - 1 month ago 21
Python Question

Python: lower() vs. casefold() in string matching and converting to lowercase

How do I do a case insensitive string comparison in Python?

From what I understood from Google and the link above that both functions:

lower()
and
casefold()
will convert the string to lowercase, but
csaefold()
will convert even the caseless letters such as the
ß
in German to
ss
.

All of that about Greek letters, but my question in general:


  • is there any other differences ?

  • which one is better to convert to lowercase ?

  • which one is better to check the matching strings?






Part 2:

firstString = "der Fluß"
secondString = "der Fluss"

# ß is equivalent to ss
if firstString.casefold() == secondString.casefold():
print('The strings are equal.')
else:
print('The strings are not equal.')


In the example above should I use:

lower() # the result is not equal which make sense to me


Or:

casefold() # which ß is ss and result is the
# strings are equal. (since I am a beginner that still does not
# make sense to me. I see different strings).

Answer Source

Casefolding is a more aggressive version of lower() that is set up to make many of the more unique unicode characters more comparable. It is another form of normalizing text that may initially appear to be very different, but it takes characters of many different languages into account.

I suggest you take a closer look into what case folding actually is, so here's a good start: W3 Case Folding Wiki

To answer your other two questions, if you are working strictly in the English language, lower() and casefold() should be yielding exactly the same results. However, if you are trying to normalize text from other languages that use more than our simple 26-letter alphabet, I would use casefold() to compare your strings, as it will yield more consistent results.elastic.co case folding