sbs sbs - 5 months ago 14
Ruby Question

How to check if a string contains ASCII code

Given a string

A\xC3B
, it can be converted to utf-8 string by doing this (ref link):

"A\xC3B".force_encoding('iso-8859-1').encode('utf-8') #=> "AÃB"


However, I only want to perform the action if the string contains the ASCII code, namely
\xC3
. How can I check for that?

Tried
"A\xC3B".include?("\x")
but it doesn't work.

Answer

\x is just a hexadecimal escape sequence. It has nothing to do with encodings on its own. US-ASCII goes from "\x00" to "\x7F" (e.g. "\x41" is the same as "A", "\x30" is "0"). The rest ("\x80" to "\xFF") however are not US-ASCII characters since it's a 7-bit character set.

If you want to check if a string contains only US-ASCII characters, call String#ascii_only?:

p "A\xC3B".ascii_only? # => false
p "\x41BC".ascii_only? # => true

Another example based on your code:

str = "A\xC3B"
unless str.ascii_only?
  str.force_encoding(Encoding::ISO_8859_1).encode!(Encoding::UTF_8)
end
p str.encoding # => #<Encoding:UTF-8>