papasmurf papasmurf - 4 days ago 6
Ruby Question

IO encoding errors in ruby

I know Ruby has a real bad wrap when it comes to pulling stuff from the web, and gets a lot of encoding errors and such. How can I force the encoding of the below array to it's true form?

["0x4E", "0x3C", "0x89", "0x50", "0xC3", "0x47", "0xFF", "0x70", "xFF", "0x2F", "0xA2", "0xB3", "0x98"]


First I tried encoding to UTF-8:

irb(main):012:0> data = ["0x4E", "0x3C", "0x89", "0x50", "0xC3", "0x47", "0xFF", "0x70", "xFF", "0x2F", "0xA2", "0xB3", "0x98"]
irb(main):013:0> data.each do |char|
irb(main):014:1* puts char.encode!("UTF-8", invalid: :replace, undef: :replace)
irb(main):015:1> end
0x4E
0x3C
0x89
0x50
0xC3
0x47
0xFF
0x70
xFF
0x2F
0xA2
0xB3
0x98
=> ["0x4E", "0x3C", "0x89", "0x50", "0xC3", "0x47", "0xFF", "0x70", "xFF", "0x2F", "0xA2", "0xB3", "0x98"]


So it seems that the characters are already UTF-8, so next I tried ISO-8859-1:

irb(main):086:0> data.each { |char|
irb(main):087:1* puts char.encode!("iso-8859-1", invalid: :replace, undef: :replace)
irb(main):088:1> }
x4E
x3C
x89
x50
xC3
x47
xFF
x70
xFF
x2F
xA2
xB3
x98
=> ["x4E", "x3C", "x89", "x50", "xC3", "x47", "xFF", "x70", "xFF", "x2F", "xA2", "xB3", "x98"]


That also did not work, it seems to have dropped the
0
's though.

So I went out on a limb and tried it with
URI.decode
:

irb(main):093:0> require 'uri'
=> true
irb(main):094:0> data.each { |char|
irb(main):095:1* puts URI.decode(char)
irb(main):096:1> }
x4E
x3C
x89
x50
xC3
x47
xFF
x70
xFF
x2F
xA2
xB3
x98
=> ["x4E", "x3C", "x89", "x50", "xC3", "x47", "xFF", "x70", "xFF", "x2F", "xA2", "xB3", "x98"]


And wouldn't you know it? It didn't work.

Is there a way to get the characters back to the original form? If it helps, this came from a URL, I do not have the full URL anymore.

Answer

Your array

["0x4E", "0x3C", "0x89", "0x50", "0xC3", "0x47", "0xFF", "0x70", "xFF", "0x2F", "0xA2", "0xB3", "0x98"]

is an array of strings, each string has four characters. The first string is "0x4E" (a zero, a small x, a 4 and an E)

Probably you want to check an array of hex values like:

data = [0x4E, 0x3C, 0x89, 0x50, 0xC3, 0x47, 0xFF, 0x70, 0xFF, 0x2F, 0xA2, 0xB3, 0x98]

To get the character values you can use Integer#chr:

p data.map{|c|c.chr} #-> ["N", "<", "\x89", "P", "\xC3", "G", "\xFF", "p", "\xFF", "/", "\xA2", "\xB3", "\x98"]

This characters can be "encoded":

p data.map { |char|
  char.chr.encode('utf-8', invalid: :replace, undef: :replace)
}    #["N", "<", "\uFFFD", "P", "\uFFFD", "G", "\uFFFD", "p", "\uFFFD", "/", "\uFFFD", "\uFFFD", "\uFFFD"]


p data.map { |char|
  char.chr.encode('iso-8859-1', invalid: :replace, undef: :replace)
} #["N", "<", "?", "P", "?", "G", "?", "p", "?", "/", "?", "?", "?"]
Comments