Tarvo Mäesepp Tarvo Mäesepp - 1 year ago 76
iOS Question

Firebase or Swift not detecting umlauts

I found some weirdest thing in

Database/Storage. The thing is that I don't know if Firebase or Swift is not detecting umlauts e.g(ä, ö, ü).

I did some easy things with Firebase like upload images to Firebase Storage and then download them into
. Some of my
files had umlauts in the title for example(

So the problem occurs now if I download them. The only time my download
is if the file name contains the umlauts I was talking about.

So I tried some alternatives like in
ö - ö
. But this is not working. Can you guys suggest me something? I can't use
ö - o
ü - u

This is the code when
when trying to set some values into Firebase:

.downloadURLWithCompletion({(url, error)in


let resource = Resource(downloadURL: url!, cacheKey: productImageref)

Answer Source

Horray for Unicode!

The short answer is that no, we're actually not doing anything special here. Basically all we do under the hood is:

// This is the list at https://cloud.google.com/storage/docs/json_api/ without the & because query parameters
NSString *const kGCSObjectAllowedCharacterSet = 

- (nullable NSString *)GCSEscapedString:(NSString *)string {
  NSCharacterSet *allowedCharacters =
      [NSCharacterSet characterSetWithCharactersInString:kGCSObjectAllowedCharacterSet];

  return [string stringByAddingPercentEncodingWithAllowedCharacters:allowedCharacters];

What blows my mind is that:

let str1 = "o\u{308}" // decomposed : latin small letter o + combining diaeresis
let str2 = "\u{f6}"   // precomposed: latin small letter o with diaeresis

print(str1, str2, str1 == str2) // ö ö true

returns true. In Objective-C (which the Firebase Storage client is built in), it totally shouldn't, as they're two totally different characters (in actuality, the length of str1 is 2 while the length of str2 is 1 in Obj-C, while in Swift I assume the answer is 1 for both).

Apple must be normalizing strings before comparison in Swift (probably a reasonable thing to do, since otherwise it leads to bugs like this where strings are "the same" but compare differently). Turns out, this is exactly what they do (see the "Extended Grapheme Clusters" section of their docs).

So, when you provide two different characters in Swift, they're being propagated to Obj-C as different characters and thus are encoded differently. Not a bug, just one of the many differences between Swift's String type and Obj-C's NSString type. When in doubt, choose a canonical representation you expect and stick with it, but as a library developer, it's very hard for us to choose that representation for you.

Thus, when naming files that contain Unicode characters, make sure to pick a standard representation (C,D,KC, or KD) and always use it when creating references.

let imageName = "smorgasbörd.jpg"
let path = "images/\(imageName)"
let decomposedPath = path.decomposedStringWithCanonicalMapping // Unicode Form D
let ref = FIRStorage.storage().reference().child(decomposedPath)
// use this ref and you'll always get the same objects