Zombies Zombies - 8 months ago 47
Ruby Question

In Ruby, how to convert special characters like ë,à,é,ä all to e,a,e,a?

I want to convert characters like

to just plain
. I am looking to convert with regards to language and how people type cities. For example, most people actually type Brasilia when searching for it, instead of Brasília. And when news agencies like Rueters report on Brasília, they usually spell it Brasilia. So again, just looking for any gem (or character encoding math/method is probably better since that answer can be used, for reference, in other languages).

This is just to handle the typical "extended ASCII" character sets. Note: I am working with standard Unicode strings.


Starting with Ruby 2.2, there is String#unicode_normalize to normalize unicode strings. The NFKD form separates character and punctuation:

#=> ["e", "̈"]
#     ^    ^
#   char  punctuation

Since the character is a valid ASCII codepoint and the punctuation is not, this can be used to remove the latter:

'ë,à,é,ä'.unicode_normalize(:nfkd).encode('ASCII', replace: '')
#=> "e,a,e,a"