Jonathon Nordquist Jonathon Nordquist - 5 months ago 36
Ruby Question

Trying to understand the Ruby .chr and .ord methods

I've been working with the Ruby

chr
and
ord
methods recently and there are a few things I don't understand.

My current project involves converting individual characters to and from ordinal values. As I understand it, if I have a string with an individual character like "A" and I call
ord
on it I get its position on the ASCII table which is 65. Calling the inverse,
65.chr
gives me the character value "A", so this tells me that Ruby has a collection somewhere of ordered character values, and it can use this collection to give me the position of a specific character, or the character at a specific position. I may be wrong on this, please correct me if I am.

Now I also understand that Ruby's default character encoding uses UTF-8 so it can work with thousands of possible characters. Thus if I ask it for something like this:

'好'.ord


I get the position of that character which is 22909. However, if I call
chr
on that value:

22909.chr


I get "RangeError: 22909 out of char range." I'm only able to get
char
to work on values up to 255 which is extended ASCII. So my questions are:


  • Why does Ruby seem to be getting values for
    chr
    from the extended ASCII character set but
    ord
    from UTF-8?

  • Is there any way to tell Ruby to use different encodings when it uses these methods? For instance, tell it to use ASCII-8BIT encoding instead of whatever it's defaulting to?

  • If it is possible to change the default encoding, is there any way of getting the total number of characters available in the set being used?


Answer

According to Integer#chr you can use the following to force the encoding to be UTF_8.

22909.chr(Encoding::UTF_8)
#=> "好"

To list all available encoding names

Encoding.name_list
#=> ["ASCII-8BIT", "UTF-8", "US-ASCII", "UTF-16BE", "UTF-16LE", "UTF-32BE", "UTF-32LE", "UTF-16", "UTF-32", ...]

A hacky way to get the maximum number of characters

2000000.times.reduce(0) do |x, i|
  begin
    i.chr(Encoding::UTF_8)
    x += 1
  rescue
  end

  x
end
#=> 1112064
Comments